Condo Rent Prediction Task - Geo-Data Enrichment

The objective of the task is to build a predictive model using machine learning models that would be able to predict a rent per square meter in Hamburg city.


Additional geo-data information or features has to be fetched from the external sources and concatenate with the dataset given. The external features could be anything useful for the analysis such as places nearby, bus stops, railway stations etc.

Dataset Description

Number of variables: 15
Number of observations: 12500

  • living_space: The total area of the rented apartment
  • rooms: Number of rooms availabe for rent
  • cold_rent - The monthly rent price
  • construction_year - The year in which the apartment is builtquarter - The
  • quarter - A town or district in the Hmaburg city
  • lambert_poistion_x - Map projection or location of the apartment
  • lambert_poistion_y - Map projection or location of the apartment
  • city - Name of the city
  • postcode - The postal code of the area in which the apartment is available for rent
  • heating_type - What type of heating is avilable in the apartment
  • number_of_bedrooms - The number of bedrooms
  • rent_per_square_meter - The rent in square meters
  • publish_date - The date on which it is published
  • latitude - The geographic coordinates of the apartment
  • longitude - The geographic coordinates of the apartment

Data Profiling

Data profiling is also done using the pandas data profiling, by using this we can automatically generate the profile reports from a pandas DataFrame. The pandas df.describe() function is great but a little basic for serious exploratory data analysis. pandas_profiling extends the pandas DataFrame with df.profile_report() and does the exploratory analysis for us. It is fast and efficient. This helps to understand the data very well. The report is also present in the folder by the name of data_profile.

Libraries

If you need all the libraries I have used for this task, please remove the hashtags and install it directly from this script for example "!{sys.executable} -m pip install geopandas". You have to just run it and it will download the library you need

In [1]:
import sys
# !{sys.executable} -m pip install geopandas
# !{sys.executable} -m pip install mplleaflet
# !{sys.executable} -m pip install bs4
# !{sys.executable} -m pip install geocoder
# !{sys.executable} -m pip install geopy
# !{sys.executable} -m pip install folium
# !{sys.executable} -m pip install lxml
# !{sys.executable} -m pip install pygeoj
# !{sys.executable} -m pip install pyshp
# !{sys.executable} -m pip install datetime
# !{sys.executable} -m pip install seaborn
# !{sys.executable} -m pip install --upgrade cython
In [2]:
# required package for timeseries
# !{sys.executable} -m pip install statsmodels
In [3]:
# # required package to run geopandas
#!conda install -c conda-forge libspatialindex -y
In [4]:
# # required packages for neighbourhood analysis
# !{sys.executable} -m pip install geopandas
# !{sys.executable} -m pip install descartes
# !{sys.executable} -m pip install requests
In [5]:
# # requiered packages for accessibility analysis
# # Make sure Cython is upgraded FIRST!
# !{sys.executable} -m pip install pandana
In [6]:
# requiered packages for modelling
#!{sys.executable} -m pip install xgboost
In [7]:
%matplotlib inline

# general
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Geographical analysis
import geopandas as gpf #libspatialindex nees to be installed first
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import geopandas as gpd
import shapefile as shp
import datetime
from statsmodels.tsa.seasonal import seasonal_decompose
import requests
import descartes
import rtree

# accessibility analysis
import time
# from pandana.loaders import osm
# from pandana.loaders import pandash5

# modelling
from sklearn.preprocessing import StandardScaler, MinMaxScaler 
from sklearn.model_selection import train_test_split, cross_val_score 
from sklearn.linear_model import LinearRegression
from sklearn import linear_model
from sklearn import metrics
import xgboost as xgb
from xgboost import plot_importance
from sklearn.metrics import explained_variance_score, mean_squared_error, r2_score

#Hide warnings
import warnings
warnings.filterwarnings('ignore')

# Set plot preference
plt.style.use(style='ggplot')
plt.rcParams['figure.figsize'] = (10, 6)

print('Libraries imported.')
Libraries imported.

Libraries for pandas data profiling which does the exploratory analysis.

In [8]:
from pathlib import Path

# Installed packages
from ipywidgets import widgets

# Our package
from pandas_profiling import ProfileReport
from pandas_profiling.utils.cache import cache_file

Libraries for plotly graphs

In [9]:
# import plotly.plotly as py
# import plotly.graph_objs as go
# from plotly.offline import iplot, init_notebook_mode
from chart_studio.plotly import plot, iplot
import cufflinks as cf
cf.go_offline()
cf.set_config_file(offline=False, world_readable=True)

1. Cleaning and Pre-processing

In [10]:
# Load the dataset
df = pd.read_csv('task_data_final.csv')
print(f"The dataset contains {len(df)} Condo Rent listings")
pd.set_option('display.max_columns', len(df.columns)) # To view all columns
pd.set_option('display.max_rows', 100)
df.head(10)
The dataset contains 12500 Condo Rent listings
Out[10]:
living_space rooms cold_rent construction_year quarter lambert_poistion_x lambert_poistion_y city postcode heating_type number_of_bedrooms rent_per_square_meter publish_date latitude longitude
0 116.4 4.5 1453.200 1976.0 Farmsen-Berne 7967.0 2625150.0 Hamburg 22159 6 NaN 14.976 2019-05-30 22:00:00 53.597793 10.121997
1 78.0 3.0 819.600 NaN Wandsbek 5628.0 2623928.0 Hamburg 22047 1 NaN 12.612 2019-01-31 23:00:00 53.586691 10.086159
2 62.4 3.5 504.000 NaN Wilstorf -239.0 2608361.0 Hamburg 21079 1 NaN 9.696 2019-01-31 23:00:00 53.444908 9.996353
3 98.4 3.5 1131.600 1982.0 Rahlstedt 9481.0 2628105.0 Hamburg 22145 1 NaN 13.800 2019-04-30 22:00:00 53.624685 10.145269
4 144.0 3.5 2280.000 NaN Blankenese -13093.0 2620974.0 Hamburg 22587 5 2.0 18.996 2019-03-28 23:00:00 53.559652 9.799682
5 97.8 3.5 1128.000 2021.0 Langenhorn 369.0 2631379.0 Hamburg 22419 5 NaN 13.836 2019-02-28 23:00:00 53.654590 10.005658
6 60.0 2.5 660.000 1964.0 Neugraben-Fischbek -9224.0 2612187.0 Hamburg 21147 6 1.0 13.200 2019-01-31 23:00:00 53.479685 9.859130
7 78.0 2.5 573.768 1970.0 Marmstorf -2662.0 2607988.0 Hamburg 21077 7 NaN 8.832 2019-06-30 22:00:00 53.441503 9.959381
8 61.2 2.5 807.600 NaN Barmbek-Nord 3091.0 2625603.0 Hamburg 22307 1 NaN 15.840 2019-04-30 22:00:00 53.601970 10.047336
9 60.0 2.5 660.000 1964.0 Neugraben-Fischbek -9224.0 2612187.0 Hamburg 21147 6 1.0 13.200 2019-01-31 23:00:00 53.479685 9.859130
In [11]:
df.count()
Out[11]:
living_space             12500
rooms                    12494
cold_rent                12500
construction_year         9890
quarter                  12500
lambert_poistion_x       12420
lambert_poistion_y       12420
city                     12500
postcode                 12500
heating_type             11587
number_of_bedrooms        4778
rent_per_square_meter    12500
publish_date             12500
latitude                 12420
longitude                12420
dtype: int64
In [12]:
df.head()
Out[12]:
living_space rooms cold_rent construction_year quarter lambert_poistion_x lambert_poistion_y city postcode heating_type number_of_bedrooms rent_per_square_meter publish_date latitude longitude
0 116.4 4.5 1453.2 1976.0 Farmsen-Berne 7967.0 2625150.0 Hamburg 22159 6 NaN 14.976 2019-05-30 22:00:00 53.597793 10.121997
1 78.0 3.0 819.6 NaN Wandsbek 5628.0 2623928.0 Hamburg 22047 1 NaN 12.612 2019-01-31 23:00:00 53.586691 10.086159
2 62.4 3.5 504.0 NaN Wilstorf -239.0 2608361.0 Hamburg 21079 1 NaN 9.696 2019-01-31 23:00:00 53.444908 9.996353
3 98.4 3.5 1131.6 1982.0 Rahlstedt 9481.0 2628105.0 Hamburg 22145 1 NaN 13.800 2019-04-30 22:00:00 53.624685 10.145269
4 144.0 3.5 2280.0 NaN Blankenese -13093.0 2620974.0 Hamburg 22587 5 2.0 18.996 2019-03-28 23:00:00 53.559652 9.799682
In [13]:
df.tail()
Out[13]:
living_space rooms cold_rent construction_year quarter lambert_poistion_x lambert_poistion_y city postcode heating_type number_of_bedrooms rent_per_square_meter publish_date latitude longitude
12495 78.84 3.5 684.000 1962.0 Rahlstedt 8515.0 2625898.0 Hamburg 22147 5 NaN 10.416 2016-01-31 23:00:00 53.604598 10.130409
12496 60.00 2.5 930.000 1874.0 Barmbek-Süd 1937.0 2622714.0 Hamburg 22083 NaN 1.0 18.600 2016-02-29 23:00:00 53.575659 10.029646
12497 92.94 2.5 606.000 NaN St. Pauli -2575.0 2619807.0 Hamburg 20359 1 NaN 7.824 2016-10-30 22:00:00 53.549175 9.960613
12498 69.54 2.5 748.200 1939.0 Winterhude 1231.0 2624265.0 Hamburg 22303 7 1.0 12.912 2016-03-29 23:00:00 53.589789 10.018847
12499 73.20 3.0 518.532 1958.0 Heimfeld -2245.0 2610261.0 Hamburg 21075 5 NaN 8.496 2016-06-30 22:00:00 53.462213 9.965728

2. Data Profiling

In [14]:
profile = ProfileReport(df, title="Condo Rent Prices", html={'style': {'full_width': True}}, sort="None")
In [15]:
# Or use the HTML report in an iframe
profile.to_notebook_iframe()



In [16]:
profile.to_file(output_file="data_profile.html")

In [17]:
df.describe()
Out[17]:
living_space rooms cold_rent construction_year lambert_poistion_x lambert_poistion_y postcode number_of_bedrooms rent_per_square_meter latitude longitude
count 12500.000000 12494.000000 12500.000000 9890.000000 12420.000000 1.242000e+04 12500.000000 4778.000000 12500.000000 12420.000000 12420.000000
mean 85.662941 2.997703 1043.164076 1975.236097 565.687520 2.622120e+06 21745.058560 1.566346 14.558444 53.570219 10.008664
std 35.876667 0.933966 580.836947 36.955579 5827.887744 5.687424e+03 886.652874 0.740789 4.313338 0.051816 0.089164
min 13.200000 1.500000 159.600000 1622.000000 -17044.000000 2.603521e+06 2103.000000 0.000000 6.000000 53.400691 9.739181
25% 62.400000 2.500000 658.800000 1958.000000 -2851.000000 2.620085e+06 21073.000000 1.000000 11.592000 53.551689 9.956471
50% 79.200000 3.000000 883.974000 1974.000000 413.000000 2.622926e+06 22117.000000 1.000000 14.040000 53.577556 10.006320
75% 100.800000 3.500000 1267.500000 2010.000000 3975.000000 2.625299e+06 22399.000000 2.000000 16.872000 53.599198 10.060839
max 326.400000 9.000000 4788.000000 2023.000000 19121.000000 2.637566e+06 27661.000000 6.000000 35.496000 53.710900 10.291816

From the above summary we can notice one thing in rooms column and that is there is no 1 room apartment available for rent, the minimum number of rooms are 1.5. The maximum number of rooms are 9.

The minimum cold rent is 159.60 and the maximum is 4788.00. The minimum rent per suare meter is 6.00 and the maximum is 35.49.

In [18]:
# Checking the null values in a dataset
df.isnull().any()
Out[18]:
living_space             False
rooms                     True
cold_rent                False
construction_year         True
quarter                  False
lambert_poistion_x        True
lambert_poistion_y        True
city                     False
postcode                 False
heating_type              True
number_of_bedrooms        True
rent_per_square_meter    False
publish_date             False
latitude                  True
longitude                 True
dtype: bool
In [19]:
# Sum of null values
df.isnull().sum()
Out[19]:
living_space                0
rooms                       6
cold_rent                   0
construction_year        2610
quarter                     0
lambert_poistion_x         80
lambert_poistion_y         80
city                        0
postcode                    0
heating_type              913
number_of_bedrooms       7722
rent_per_square_meter       0
publish_date                0
latitude                   80
longitude                  80
dtype: int64

There are many missing values in the dataset especially the number_of_bedrooms i.e 7722. It would be better to get rid of the columns with so many missing values.

In [20]:
def percent_value_counts(df, feature):
    """This function takes in a dataframe and a column and finds the percentage of the value_counts"""
    percent = pd.DataFrame(round(df.loc[:,feature].value_counts(dropna=False, normalize=True)*100,2))
    ## creating a df with the
    total = pd.DataFrame(df.loc[:,feature].value_counts(dropna=False))
    
    ## concating percent and total dataframe
    total.columns = ["Total"]
    percent.columns = ['Percent']
    return pd.concat([total, percent], axis = 1)
In [21]:
percent_value_counts(df, 'number_of_bedrooms') 
Out[21]:
Total Percent
NaN 7722 61.78
1.0 2657 21.26
2.0 1591 12.73
3.0 438 3.50
4.0 70 0.56
0.0 12 0.10
5.0 9 0.07
6.0 1 0.01
In [22]:
percent_value_counts(df, 'construction_year') 
Out[22]:
Total Percent
NaN 2610 20.88
2022.0 395 3.16
2020.0 349 2.79
2021.0 320 2.56
1904.0 294 2.35
... ... ...
1851.0 1 0.01
1724.0 1 0.01
1648.0 1 0.01
1882.0 1 0.01
1818.0 1 0.01

157 rows × 2 columns

We can see the total number of missing values and percentage of missing values for a particular attribute.

In [23]:
# Plotting the distribution of numerical and boolean categories
df.hist(figsize=(20,20));

From the above, it can be seen that several columns only contain few categories and can be dropped such as number_of_bedrooms, construction_year, heating_type. We can also remove lambert position, we don't need that. Later we can see what else we can remove.

In [24]:
df.drop(['number_of_bedrooms', 'construction_year', 'heating_type', 'lambert_poistion_x', 'lambert_poistion_y'], axis=1, inplace=True)
In [25]:
df.isna().sum()
Out[25]:
living_space              0
rooms                     6
cold_rent                 0
quarter                   0
city                      0
postcode                  0
rent_per_square_meter     0
publish_date              0
latitude                 80
longitude                80
dtype: int64

Now still we have missing values, we need the locations of the apartments and 80 are missing, we can get rid of that.

In [26]:
df = df.dropna()
In [27]:
df.isna().sum()
Out[27]:
living_space             0
rooms                    0
cold_rent                0
quarter                  0
city                     0
postcode                 0
rent_per_square_meter    0
publish_date             0
latitude                 0
longitude                0
dtype: int64
In [28]:
df.count()
Out[28]:
living_space             12414
rooms                    12414
cold_rent                12414
quarter                  12414
city                     12414
postcode                 12414
rent_per_square_meter    12414
publish_date             12414
latitude                 12414
longitude                12414
dtype: int64

Now the dataset is clean and perfect for the analysis and prediction.

3. Analysis and Visualization

Let's start with the bar graph, let's see the cold rent according to the rooms. It is but obvious that if the number of rooms increase the rent also increases. Let's check that.

In [29]:
rent_pivot = df.pivot_table(index="rooms",values="cold_rent")
rent_pivot.plot.bar()
plt.show()

In the above bar graph, we can see that as the number of rooms increases, the cold rent increases. But in the case of 3.7 rooms (don't know if there are 2.7 or 3.7 rooms available) the rent is low as compare to others. Let's check for the rent per square meter with rooms. Is there any difference?

In [30]:
rent_pivot = df.pivot_table(index="rooms",values="rent_per_square_meter")
rent_pivot.plot.bar()
plt.show()

In the above plot, we can again see that 3.7 rooms have a lower rent price as compare to others. We can also replace 2.7 and 3.7 with 2.5 and 3.5 but we will keep it like this. The rent per square meter bar graph looks good, 2.0 and 2.5 rooms per square meter price is lower than the 1.5 room.

I am very curious about 2.7 and 3.7 rooms. Let's check how many 2.7 and 3.7 rooms are there in a dataset.

In [31]:
percent_value_counts(df, 'rooms') 
Out[31]:
Total Percent
2.5 4422 35.62
3.5 3666 29.53
1.5 1391 11.21
4.5 970 7.81
3.0 946 7.62
2.0 324 2.61
4.0 312 2.51
5.5 183 1.47
5.0 103 0.83
6.5 43 0.35
6.0 19 0.15
7.5 10 0.08
2.7 7 0.06
7.0 6 0.05
8.0 4 0.03
8.5 4 0.03
3.7 2 0.02
9.0 2 0.02

Now we can see that there are total of 7 and 0.06% of 2.7 rooms and the total of 2 and 0.02% of 3.7 rooms present in the dataset. I believe this is some typing error or mistake. We can simply replace 2.7 rooms with 2.5 and 3.7 rooms with 3.5 as round numbers.

In [32]:
# replacing the 2.7 and 3.7 rooms with 2.5 and 3.5
df.rooms= df.rooms.astype(str).str.replace('3.7','3.5',regex=True)
df.rooms= df.rooms.astype(str).str.replace('2.7','2.5',regex=True)
In [33]:
# Checking the replacement 
percent_value_counts(df, 'rooms') 
Out[33]:
Total Percent
2.5 4429 35.68
3.5 3668 29.55
1.5 1391 11.21
4.5 970 7.81
3.0 946 7.62
2.0 324 2.61
4.0 312 2.51
5.5 183 1.47
5.0 103 0.83
6.5 43 0.35
6.0 19 0.15
7.5 10 0.08
7.0 6 0.05
8.5 4 0.03
8.0 4 0.03
9.0 2 0.02
In [34]:
# Checking the rent per square meter in the city
pd.set_option('display.max_rows', 500)
df.groupby('city').max()['rent_per_square_meter']
Out[34]:
city
20535 hamburg                                         10.668
22455                                                 10.992
Altona-Altstadt                                       32.220
Bahrenfeld                                            12.408
Barmbek                                               14.352
Barmbek-Nord                                           7.860
Barmbel Büd                                           14.400
Bergedorf                                             15.444
Borgfelde                                             15.732
Braak                                                 21.336
Bramfeld                                              13.740
Bramfeld, Hamburg                                     12.000
Eidelstedt                                            13.236
Eppendorf                                             18.600
Farmsen-Berne                                         14.592
H                                                     17.736
HAMBURG                                               20.052
HAMBURG - MEIENDORF                                   13.512
HAMBURG - RAHLSTEDT/ OLDENFELDE                       10.824
HH-Hoheluft-West                                      15.672
Ha,mburg                                              17.004
Hambrug                                               19.656
Hambrug-Bramfeld                                      11.328
Hambur Hamm                                           17.364
Hamburg                                               35.496
Hamburg  (Lokstedt)                                   15.132
Hamburg  Neugraben                                    11.292
Hamburg (Alt-Osdorf)                                  14.724
Hamburg (Barmbek-Süd)                                 18.108
Hamburg (Eimsbüttel)                                  11.688
Hamburg (Finkenwerder)                                 9.756
Hamburg (Groß Flottbek)                               16.728
Hamburg (HH-Stellingen)                               14.844
Hamburg (Othmarschen)                                 15.360
Hamburg (Schierenberg)                                15.900
Hamburg (Tonndorf)                                    16.740
Hamburg - Altona                                      19.404
Hamburg - Altona-Nord                                 17.964
Hamburg - Barmbek-Süd                                 11.400
Hamburg - Bergedorf                                   14.832
Hamburg - Blankenese                                  18.984
Hamburg - Brahmfeld                                   13.968
Hamburg - Bramfeld                                    12.252
Hamburg - Eilbek                                       9.600
Hamburg - Eißendorf                                   11.928
Hamburg - Farmsen-Berne                               16.596
Hamburg - Groß Borstel                                13.800
Hamburg - Hafencity                                    7.440
Hamburg - Hammerbrook                                 18.360
Hamburg - Harburg                                     10.788
Hamburg - Harvestehude                                23.292
Hamburg - Heimfeld                                     9.216
Hamburg - Hummelsbüttel                               12.000
Hamburg - Jenfeld                                     11.748
Hamburg - Langenhorn                                  13.332
Hamburg - Lokstedt                                    19.428
Hamburg - Marienthal                                  12.696
Hamburg - Marmstorf                                   11.064
Hamburg - Moorfleet                                   12.000
Hamburg - Ottensen                                    16.824
Hamburg - Rahlstedt                                   13.920
Hamburg - Schnelsen                                   12.000
Hamburg - Sülldorf                                    12.912
Hamburg - Tonndorf                                    17.652
Hamburg - Uhlenhorst                                  19.656
Hamburg - Vierlanden                                  12.756
Hamburg - Wandsbek                                    19.200
Hamburg -Bergedorf                                    11.724
Hamburg -Harburg                                      10.848
Hamburg -Klein Flottbek                               16.092
Hamburg / Allermöhe 21037                             12.528
Hamburg / Alsterdorf                                  16.044
Hamburg / Altona-Nord                                 35.196
Hamburg / Altstadt                                    26.676
Hamburg / Bahrenfeld                                  15.636
Hamburg / Barmbek                                     17.736
Hamburg / Barmbek-Nord                                 9.636
Hamburg / Barmbek-Süd                                 21.924
Hamburg / Bergedorf                                   14.184
Hamburg / Blankenese                                  19.296
Hamburg / Bramfeld                                    15.264
Hamburg / Curslack                                    13.920
Hamburg / Dockenhuden                                 14.388
Hamburg / Dulsberg                                    15.300
Hamburg / Eilbek                                      20.400
Hamburg / Eißendorf                                    9.588
Hamburg / Eppendorf                                   26.292
Hamburg / Farmsen                                     12.312
Hamburg / Finkenwerder                                17.640
Hamburg / Fuhlsbüttel                                 17.472
Hamburg / Groß Borstel                                12.516
Hamburg / Hamm-Nord                                   17.016
Hamburg / Harburg                                     21.252
Hamburg / Harvestehude                                29.688
Hamburg / Heimfeld                                    16.788
Hamburg / Hoheluft-West                               18.960
Hamburg / Hummelsbüttel                               17.520
Hamburg / Jenfeld                                     15.780
Hamburg / Kirchwerder                                 13.824
Hamburg / Klein Flottbek                              17.544
Hamburg / Lohbrügge                                   12.960
Hamburg / Lokstedt                                    20.628
Hamburg / Lurup                                       13.980
Hamburg / Marienthal                                  15.636
Hamburg / Marmstorf                                   10.968
Hamburg / Neustadt                                    14.748
Hamburg / Neuwiedenthal                               10.992
Hamburg / Niendorf                                    15.648
Hamburg / Ochsenwerder                                12.000
Hamburg / Ohlsdorf                                    14.052
Hamburg / Ottensen                                    18.048
Hamburg / Poppenbüttel                                12.000
Hamburg / Rahlstedt                                   12.948
Hamburg / Rissen                                      10.056
Hamburg / Rothenburgsort                              16.164
Hamburg / Rotherbaum                                  15.600
Hamburg / Sankt Georg                                 16.260
Hamburg / Sankt Pauli                                 26.772
Hamburg / Sasel                                       12.456
Hamburg / Schnelsen                                   15.936
Hamburg / Sinstorf                                    14.244
Hamburg / Stellingen                                  16.020
Hamburg / Tonndorf                                    11.820
Hamburg / Uhlenhorst                                  20.568
Hamburg / Volksdorf                                   12.540
Hamburg / Wandsbek                                    13.368
Hamburg / Wellingsbüttel                              19.032
Hamburg / Wilhelmsburg                                17.700
Hamburg / Wilstorf                                    10.740
Hamburg / Winterhude                                  21.336
Hamburg / Wohldorf-Ohlstedt                           12.756
Hamburg Alsterdorf                                    15.948
Hamburg Alt Rahlstedt                                 11.808
Hamburg Alt-Osdorf                                    15.528
Hamburg Barmbek Süd                                   17.316
Hamburg Barmbek-Nord                                  12.024
Hamburg Bergedorf                                     15.660
Hamburg Bergstedt                                     18.396
Hamburg Billstedt                                     12.624
Hamburg Borgfelde                                     13.884
Hamburg Bramfeld                                      14.580
Hamburg Eidelstedt                                    12.612
Hamburg Eilbek                                        11.712
Hamburg Eimsbüttel                                    15.204
Hamburg Eißendorf                                     13.500
Hamburg Eppendorf                                     24.108
Hamburg Finkenwerder                                  11.028
Hamburg Grindelhof                                    17.352
Hamburg Harburg                                       12.000
Hamburg Harburg / Wilstorf                             9.768
Hamburg Harvestehude                                  21.276
Hamburg Heimfeld                                      12.000
Hamburg Hohenfelde                                    20.964
Hamburg Horn                                          11.568
Hamburg Hummelsbüttel / Fuhlsbüttel Nord              15.720
Hamburg Kirchwerder                                   12.612
Hamburg Klein Flottbek                                21.360
Hamburg Lohbrügge                                     13.800
Hamburg Lokstedt                                      18.972
Hamburg Lurup                                         14.652
Hamburg Marienthal                                    16.668
Hamburg Neustadt                                       7.344
Hamburg Niendorf                                      14.172
Hamburg OT Bahrenfeld                                 19.200
Hamburg OT Lemsahl-Mellingstedt                       13.248
Hamburg Ohlsdorf                                      11.052
Hamburg Ohlstedt                                      12.144
Hamburg Ojendorf/Billstedt                            14.448
Hamburg Ottensen                                      22.200
Hamburg Poppenbüttel                                  14.832
Hamburg Rahlstedt                                     15.612
Hamburg Rissen                                        14.676
Hamburg Rothenburgsort                                16.644
Hamburg Rotherbaum                                    18.828
Hamburg Rönneburg                                     15.000
Hamburg Sankt Pauli                                   21.996
Hamburg St Pauli                                      17.916
Hamburg St. Georg                                     13.776
Hamburg St. Pauli                                     17.148
Hamburg Stelling                                      15.552
Hamburg Sülldorf                                       8.508
Hamburg Uhlenhorst                                    24.780
Hamburg Wandsbek                                      28.500
Hamburg Wellingsbüttel                                17.460
Hamburg Winterhude                                    30.780
Hamburg Winterude Forsmannstr.                        15.624
Hamburg hummelsbüttel                                 12.444
Hamburg,                                              20.400
Hamburg, Altona-Altstadt                               9.912
Hamburg, Bahrenfeld                                    7.308
Hamburg, Barmbek-Süd                                  10.644
Hamburg, Bergedorf                                     9.804
Hamburg, Groß Borstel                                 18.960
Hamburg, HafenCity                                    35.400
Hamburg, HafenCity, Osakaallee 2                      16.200
Hamburg, HafenCity, Osakaallee 6                      18.780
Hamburg, HafenCity, Singapurstraße 19                 17.052
Hamburg, Hammerbrook                                  20.328
Hamburg, Hoheluft-Ost                                 10.248
Hamburg, Hohenfelde                                   10.248
Hamburg, Ottensen                                     31.668
Hamburg, Rothenburgsort                               18.000
Hamburg, Rotherbaum                                    7.368
Hamburg, Schnelsen                                    17.880
Hamburg, St. Georg                                     8.184
Hamburg, St. Pauli                                    10.236
Hamburg, Uhlenhorst                                    9.816
Hamburg, Winterhude                                   11.604
Hamburg-                                              16.800
Hamburg- Cranz                                        10.416
Hamburg- Fischbek                                     13.296
Hamburg- Harburg                                      11.184
Hamburg- Neugraben                                    10.812
Hamburg- Ottensen                                     18.768
Hamburg-Allermöhe                                     13.044
Hamburg-Alsterdorf                                    13.356
Hamburg-Alt Rahlstedt/Oldenfelde                      13.668
Hamburg-Altona                                        18.000
Hamburg-Altona (Lurup)                                12.000
Hamburg-Altona/St.Pauli                               16.716
Hamburg-Altstadt                                      20.160
Hamburg-Bahrenfeld                                    18.480
Hamburg-Barmbek                                       14.688
Hamburg-Barmbek Nord                                  14.484
Hamburg-Barmbek-Nord                                  17.928
Hamburg-Bergedorf                                     13.860
Hamburg-Bergstedt                                     14.256
Hamburg-Berne                                         11.280
Hamburg-Billstedt                                      9.372
Hamburg-Binnenhafen                                   13.812
Hamburg-Blankenese                                    16.956
Hamburg-Borgfelde                                     16.800
Hamburg-Bramfeld                                      13.080
Hamburg-Duhlsberg                                     11.328
Hamburg-Dulsberg                                      14.880
Hamburg-Duvenstedt                                    11.124
Hamburg-Eidelstedt                                    13.308
Hamburg-Eilbek                                        16.800
Hamburg-Eimsbüttel                                    20.388
Hamburg-Eppedorf                                      15.948
Hamburg-Eppendorf                                     20.772
Hamburg-Finkenwerder                                  10.500
Hamburg-Fuhlsbüttel                                   12.816
Hamburg-Groß Borstel                                  16.176
Hamburg-Groß Flottbek                                 24.504
Hamburg-HafenCity                                     31.008
Hamburg-Hafencity                                     20.844
Hamburg-Hamm                                          16.812
Hamburg-Hamm-Nord                                     16.800
Hamburg-Harburg                                       15.000
Hamburg-Harburg (OT: Rönneburg)                       10.824
Hamburg-Harvestehude                                  23.856
Hamburg-Hausbruch                                     13.176
Hamburg-Heimfeld                                      16.080
Hamburg-Hoheluft                                      19.224
Hamburg-Hoheluft Ost                                  24.132
Hamburg-Hoheluft-Ost                                  14.448
Hamburg-Hoheluft-West                                 10.440
Hamburg-Hohenfelde                                    15.504
Hamburg-Horn                                          16.788
Hamburg-Iserbrook                                     18.072
Hamburg-Jenfeld                                        8.100
Hamburg-Langenhorn                                    15.432
Hamburg-Lohbrügge                                     10.380
Hamburg-Lokstedt                                      18.096
Hamburg-Lurup                                         13.416
Hamburg-Marienthal                                    17.532
Hamburg-Marmsdorf                                     20.004
Hamburg-Marmstorf                                      8.712
Hamburg-Meiendorf                                     14.352
Hamburg-Mümmelmannsberg                                6.624
Hamburg-Neugraben                                     15.516
Hamburg-Neugraben Fischbek                            11.400
Hamburg-Neustadt                                      18.000
Hamburg-Niendorf                                      18.000
Hamburg-Nienstedten                                   19.008
Hamburg-Ohlsdorf                                      12.000
Hamburg-Osdorf                                        17.808
Hamburg-Othmarschen                                   13.368
Hamburg-Ottensen                                      19.200
Hamburg-Poppenbüttel                                  14.400
Hamburg-Rahlstedt Meiendorf                           16.476
Hamburg-Rissen                                        15.948
Hamburg-Rotherbaum                                    19.260
Hamburg-Sasel                                         16.800
Hamburg-Schnelsen                                     14.064
Hamburg-Sinstorf                                      13.452
Hamburg-St. Georg                                     21.000
Hamburg-St. Pauli                                     18.060
Hamburg-Stellingen                                    11.388
Hamburg-Tonndorf                                      14.076
Hamburg-Uhlenhorst                                    21.600
Hamburg-Volksdorf                                     12.300
Hamburg-WInterhude                                    16.272
Hamburg-Wandsbek                                      14.940
Hamburg-Wellingsbüttel                                21.492
Hamburg-Wilhelmsburg                                  21.204
Hamburg-Wilstorf                                      11.976
Hamburg-Winterhude                                    33.132
Hamburg-Wohldorf-Ohlstedt                             19.200
Hamburg/Eppendorf                                     17.892
Hamburg/Farmsen                                       15.000
Hamburg/Winterhude                                    19.668
Hamburghttps://www.immobilienscout24.de/scoutmanag    18.000
Hamburgt                                              15.000
Hamburug                                              19.476
Hammburg                                              16.296
Hamnurg                                               19.200
Hamrburg Uhlenhorst                                   13.128
Hanburg                                               15.156
Harburg                                               12.000
Harvestehude                                          17.016
Hmaburg                                               34.284
Hoheluft-West                                         14.424
Hummelsbüttel                                         14.004
Iserbrook                                             14.208
Jenfeld                                                9.372
Lemsahl-Mellingstedt                                  12.300
Lokstedt                                              15.936
Mitte - Hamburg Billstedt                             15.432
Moorfleet                                             11.400
Negraben-Fischbek                                     11.244
Neugraben - Fischbek                                  13.800
Neugraben-Fischbek                                    13.200
Neumünster                                            11.712
Nienstedten                                           17.688
Ochsenwerder                                          14.568
Othmarschen                                           18.060
Rahlstedt                                             18.324
Rahlstedt-Zentrum                                      9.060
Rissen                                                18.648
St. Georg                                             18.000
St. Pauli                                             27.540
Uhlenhorst                                            20.004
Uhlenhorst Hamburg                                    16.200
Volksdorf                                             10.560
Wilhelmsburg                                          11.712
Winterhude                                            24.276
hamburg                                               17.796
hamburg bramfeld                                      12.288
hamburg-Eimsbüttel                                    13.908
hgttfrf                                               12.996
osdorf                                                15.936
Name: rent_per_square_meter, dtype: float64

There are so many spelling mistakes, syntax and errors in the column city name. It is not clean and properly managed the names. It could give us the wrong impression and data values. It is not also good for the prediction model. We need to get rid of these spelling mistakes and syntax. Somewhere it is written Hamnurg, Hammburg etc. Somewhere it is like Altona-Altstadt and the same is Hamburg Altona-Altstadt. We need to clean this cloumn.

In [35]:
df = df[df.city != 'H']
df = df[df.city != 'hgttfrf']
df = df[df.city != 'osdorf']
df.city= df.city.str.replace('Hamburg/','Hamburg ',regex=True)
df.city= df.city.str.replace('Hamburg /','Hamburg ',regex=True)
df.city= df.city.str.replace('Hamburg-','Hamburg ' ,regex=True)
df.city= df.city.str.replace('Hamburg -','Hamburg',regex=True)
df.city= df.city.str.replace('Hamburg,','Hamburg ',regex=True)
df.city= df.city.str.replace('HAMBURG -','Hamburg',regex=True)
df.city= df.city.str.replace('Hambrug','Hamburg ',regex=True)
df.city= df.city.str.replace('Hambur','Hamburg ',regex=True)
df.city= df.city.str.replace('Ha,mburg','Hamburg ',regex=True)
df.city= df.city.str.replace('Hamburghttps://www.immobilienscout24.de/scoutmanag','Hamburg',regex=True)
df.city= df.city.str.replace('Hamburg https://www.immobilienscout24.de/scoutmanag','Hamburg',regex=True)
df.city= df.city.str.replace('Hamburghttps://www.immobilienscout24.de/scoutmanag','Hamburg',regex=True)
df.city= df.city.str.replace('Hamburg g','Hamburg ',regex=True)
df.city= df.city.str.replace('Hamburg t','Hamburg',regex=True)
df.city= df.city.str.replace('Hamburg ug','Hamburg',regex=True)
df.city= df.city.str.replace('Hanburg','Hamburg',regex=True)
df.city= df.city.str.replace('Harburg','Hamburg',regex=True)
df.city= df.city.str.replace('HAMBURG','Hamburg',regex=True)
df.city= df.city.str.replace('Hmaburg','Hamburg',regex=True)
df.city= df.city.str.replace('20535 hamburg','Hamburg',regex=True)
df.city= df.city.str.replace('Mitte - Hamburg  Billstedt','Hamburg Billstedt',regex=True)
df.city= df.city.str.replace('22455','Hamburg',regex=True)
df.city= df.city.str.replace('Hamburg ','Hamburg',regex=True)
df.city= df.city.str.replace('-Bramfeld','',regex=True)
df.city= df.city.str.replace('Hamnurg ','Hamburg',regex=True)
df.city= df.city.str.replace('Hamnurg','Hamburg',regex=True)
df.city= df.city.str.replace('Hammburg','Hamburg',regex=True)
df.city= df.city.str.replace('Hamrburg','Hamburg',regex=True)
df.city= df.city.str.replace('hamburg-','Hamburg ',regex=True)
df.city= df.city.str.replace('hamburg','Hamburg',regex=True)
df.city= df.city.str.replace('Hamburghttps://www.immobilienscout24.de/scoutmanag','Hamburg',regex=True)
df.city= df.city.str.replace('Hamburg Hamburg','Hamburg',regex=True)
df.city= df.city.str.replace('Hamburg  HafenCity, Osakaallee 6','Hamburg',regex=True)
#df.city= df.city.str.replace('Altona-Nord','Hamburg Altona-Nord',regex=True)
df.city= df.city.str.replace('Altona-Altstadt','Hamburg Altona-Altstadt',regex=True)
df.city= df.city.str.replace('Hamburg/ Wilstorf','Hamburg Wilstorf',regex=True)
df.city= df.city.str.replace('Bramfeld, Hamburg','Hamburg',regex=True)
df.city= df.city.str.replace('Hamburg  ','Hamburg ',regex=True)
 
In [36]:
# df['city'] = df['city'].str.strip()
# df['city'] = df['city'].str.lstrip()
# df['city'] = df['city'].str.rstrip()
# #df['city'] = df['city'].str.replace("  "," ")
In [37]:
# lets check it again 
pd.set_option('display.max_rows', 500)
df.groupby(('city'), sort=False)['rent_per_square_meter'].max()
Out[37]:
city
Hamburg                                     35.496
Hamburg Niendorf                            18.000
Hamburg Marienthal                          17.532
Hamburg Jenfeld                             15.780
Hamburg Schnelsen                           17.880
Hamburg Eppendorf                           26.292
Hamburg HafenCity                           35.400
Eidelstedt                                  13.236
Neumünster                                  11.712
Hamburg Klein Flottbek                      21.360
Hamburg Winterhude                          33.132
Hamburg Harvestehude                        29.688
Hamburg hummelsbüttel                       12.444
Hamburg (HH-Stellingen)                     14.844
Uhlenhorst                                  20.004
Hamburg Heimfeld                            16.788
Hamburg Barmbek                             17.736
Hamburg Lohbrügge                           13.800
Rissen                                      18.648
Hamburg Hafencity                           20.844
Hamburg Lokstedt                            20.628
Hamburg Ojendorf/Billstedt                  14.448
Hamburg Bahrenfeld                          18.480
Hamburg Rahlstedt                           15.612
Hamburg Sasel                               16.800
Hamburg Altona                              19.404
Hamburg Farmsen-Berne                       16.596
Hamburg Curslack                            13.920
Hamburg Horn                                16.788
Hamburg St. Pauli                           18.060
Hamburg Bergedorf                           15.660
Hamburg Sankt Pauli                         26.772
Hamburg Hoheluft                            19.224
Hamburg Barmbek Nord                        14.484
Hamburg Barmbek-Nord                        17.928
Hamburg Ottensen                            31.668
Hamburg Hamm                                17.364
Hamburg Eimsbüttel                          20.388
Hamburg Cranz                               10.416
Hamburg Poppenbüttel                        14.832
Hamburg Alsterdorf                          16.044
Hamburg Altona-Altstadt                     32.220
Hamburg Wellingsbüttel                      21.492
Hamburg Marmstorf                           11.064
Hamburg Wandsbek                            28.500
Hamburg Neugraben                           15.516
Hamburg Finkenwerder                        17.640
Hamburg HafenCity, Osakaallee 2             16.200
Hamburg Rissen                              15.948
Hamburg Farmsen                             15.000
Hamburg Lurup                               14.652
Hamburg Brahmfeld                           13.968
Hamburg Bramfeld                            15.264
Winterhude                                  24.276
Hamburg Tonndorf                            17.652
Hamburg Eilbek                              20.400
Hamburg Hausbruch                           13.176
Hamburg Rotherbaum                          19.260
St. Pauli                                   27.540
Hamburg Eidelstedt                          13.308
Hamburg Hamm-Nord                           17.016
Hamburg Altona-Nord                         35.196
Hamburg Langenhorn                          15.432
Hamburg Dulsberg                            15.300
Hamburg Wilhelmsburg                        21.204
Hamburg Barmbek-Süd                         21.924
Hamburg Uhlenhorst                          24.780
Hamburg Hamburg                             21.252
Hamburg Ochsenwerder                        12.000
Hamburg (Groß Flottbek)                     16.728
Hamburg Alt Rahlstedt                       11.808
Hamburg Fischbek                            13.296
Hamburg OT Lemsahl-Mellingstedt             13.248
Hamburg Neustadt                            18.000
Hamburg Groß Flottbek                       24.504
Hamburg Borgfelde                           16.800
Barmbek-Nord                                 7.860
Hamburg Wilstorf                            11.976
Hamburg Iserbrook                           18.072
Hamburg Eißendorf                           13.500
Uhlenhorst Hamburg                          16.200
Wilhelmsburg                                11.712
Braak                                       21.336
Hamburg Wohldorf-Ohlstedt                   19.200
Hamburg Kirchwerder                         13.824
Hamburg (Tonndorf)                          16.740
Hamburg (Schierenberg)                      15.900
Hamburg Stellingen                          16.020
HamburgBergedorf                            11.724
Hoheluft-West                               14.424
Hamburg Stelling                            15.552
Hamburg Groß Borstel                        18.960
Hamburg Meiendorf                           14.352
HamburgBillstedt                            15.432
Hamburg Hohenfelde                          20.964
Hamburg Ohlsdorf                            14.052
Hamburg RAHLSTEDT/ OLDENFELDE               10.824
Hamburg Bergstedt                           18.396
Hamburg Othmarschen                         13.368
Hamburg Fuhlsbüttel                         17.472
Hamburg Neugraben Fischbek                  11.400
Hamburg Blankenese                          19.296
Neugraben - Fischbek                        13.800
Hamburg Rahlstedt Meiendorf                 16.476
Hamburg Ohlstedt                            12.144
Hamburg Hummelsbüttel                       17.520
Hamburg (Lokstedt)                          15.132
St. Georg                                   18.000
Jenfeld                                      9.372
Hamburg St. Georg                           21.000
Hamburg                                     20.400
Borgfelde                                   15.732
Nienstedten                                 17.688
Hamburg Dockenhuden                         14.388
Hamburg Rothenburgsort                      18.000
Hummelsbüttel                               14.004
Hamburg Allermöhe                           13.044
Hamburg Hammerbrook                         20.328
Hamburg Volksdorf                           12.540
Hamburg OT Bahrenfeld                       19.200
Neugraben-Fischbek                          13.200
Farmsen-Berne                               14.592
Rahlstedt-Zentrum                            9.060
Hamburg (Finkenwerder)                       9.756
Hamburg Altona/St.Pauli                     16.716
Iserbrook                                   14.208
Hamburg Alt-Osdorf                          15.528
Hamburg Winterude Forsmannstr.              15.624
Hamburg Alt Rahlstedt/Oldenfelde            13.668
Bergedorf                                   15.444
Lemsahl-Mellingstedt                        12.300
Hamburg Sinstorf                            14.244
Hamburg Neuwiedenthal                       10.992
Bahrenfeld                                  12.408
Hamburg Billstedt                           12.624
Hamburg Marmsdorf                           20.004
Hamburg Hoheluft-West                       18.960
Ochsenwerder                                14.568
Rahlstedt                                   18.324
HamburgKlein Flottbek                       16.092
Hamburg MEIENDORF                           13.512
Bramfeld                                    13.740
Hamburg Barmbek Süd                         17.316
Othmarschen                                 18.060
Hamburg Grindelhof                          17.352
Hamburg Hummelsbüttel / Fuhlsbüttel Nord    15.720
Hamburg Altstadt                            26.676
Hamburg bramfeld                            12.288
Hamburg Osdorf                              17.808
Hamburg Hoheluft-Ost                        14.448
Hamburg Duhlsberg                           11.328
Hamburg Duvenstedt                          11.124
Hamburg (Eimsbüttel)                        11.688
Hamburg Mümmelmannsberg                      6.624
Barmbek                                     14.352
Harvestehude                                17.016
Hamburg WInterhude                          16.272
Lokstedt                                    15.936
Hamburg HafenCity, Singapurstraße 19        17.052
Hamburg Nienstedten                         19.008
Hamburg Berne                               11.280
Hamburg Rönneburg                           15.000
Hamburg Altona (Lurup)                      12.000
Volksdorf                                   10.560
Hamburg(OT: Rönneburg)                      10.824
Hamburg (Alt-Osdorf)                        14.724
Hamburg Binnenhafen                         13.812
Hamburg St Pauli                            17.916
Hamburg Hoheluft Ost                        24.132
Hamburg Eppedorf                            15.948
HamburgHamburg                              10.848
Hamburg Allermöhe 21037                     12.528
Eppendorf                                   18.600
Negraben-Fischbek                           11.244
Hamburg Hamburg Altona-Altstadt              9.912
Hamburg Vierlanden                          12.756
HH-Hoheluft-West                            15.672
Barmbel Büd                                 14.400
Hamburg Moorfleet                           12.000
Hamburg Sülldorf                            12.912
Hamburg (Barmbek-Süd)                       18.108
Moorfleet                                   11.400
Hamburg Sankt Georg                         16.260
Hamburg (Othmarschen)                       15.360
Name: rent_per_square_meter, dtype: float64

Now it looks much better than the previous one. We removed the empty spaces, hyphens, commas and cleaned some names in a proper manner or format. Hamburg shows the high rent rates of 35.496 per square meter. There are few cities which have higher rent in the 30's. Let's see the same in a plot, that would be much clearer and visible than this.

Now we will use the plotly graphs to build some beautiful and interactive charts, we are using plotly because there are so many categories and values and by using plotly we can easily zoom in or zoom out the graphs and whenever there is something which is not clear we can hover the mouse pointer to the graphs and check the values, names, numbers related to that. We can also select and deselect the attributes and check the values.

In [38]:
df.pivot(columns='city', values='rent_per_square_meter').iplot(
        kind='box',
        yTitle='Rent Per Square Meter',
        title='City and Rent')

Now in the above box plot graph if we move to hover the mouse cursor to the graph it shows the values and we can also select and deselect the attributes on the right-hand side. With the box plot above we can see the rent per square meter in the city. We can also check the minimum rent, mean and maximum rent in the city.

We can see that maximum rent is in the city Hamburg and Hamburg Hafencity with 35.4 followed by Hamburg Altona-Nord with 35.196.

The interesting city is Hamburg Altona-Altstadt, the apartment rents are very high there above 31. The maximum rent is 32.22, the median is 31.848 and the minimum is 31.476. Hamburg has the maximum rent but it has the minimum rent of 6 per square meter as well.

Hamburg Mümmelmannsberg has the maximum rent of 6.624, median 6.384 and minimum of 6.

In [39]:
# Checking the rent per square meter in the quarter. We can also see if the names written are correct or in the 
# right format
pd.set_option('display.max_rows', 500)
df.groupby(('quarter'), sort=False)['rent_per_square_meter'].max()
Out[39]:
quarter
Farmsen-Berne           22.800
Wandsbek                26.004
Wilstorf                17.760
Rahlstedt               21.336
Blankenese              25.308
Langenhorn              19.596
Neugraben-Fischbek      20.004
Marmstorf               27.144
Barmbek-Nord            32.004
Eimsbüttel              32.640
Uhlenhorst              35.220
Winterhude              33.132
Harburg                 32.280
Eißendorf               24.000
Poppenbüttel            18.228
Billstedt               18.000
Barmbek-Süd             27.468
Groß Flottbek           24.996
Eilbek                  27.276
Ohlsdorf                29.604
Horn                    20.040
Bergedorf               26.184
Eidelstedt              22.284
Niendorf                24.000
Eppendorf               32.004
Hamm-Mitte              20.352
Groß Borstel            33.756
Marienthal              35.496
Jenfeld                 18.444
Schnelsen               18.180
Hamm-Nord               22.800
St. Pauli               27.540
Harvestehude            35.028
Hamb.-Altstadt          34.104
Altona-Altstadt         30.636
Rotherbaum              33.924
Lohbrügge               17.328
Volksdorf               19.500
Lokstedt                27.000
Steilshoop              17.076
HafenCity               35.400
Rissen                  23.460
Fuhlsbüttel             22.104
Sasel                   19.200
Sülldorf                28.128
Neustadt                29.544
Hummelsbüttel           20.856
Bahrenfeld              31.200
Hohenfelde              33.792
Hoheluft-West           30.000
St. Georg               31.920
Wohldorf-Ohlstedt       19.200
Heimfeld                33.348
Othmarschen             30.000
Bramfeld                27.996
Stellingen              30.672
Wilhelmsburg            32.712
Iserbrook               20.004
Kleiner Grasbrook       25.128
Nienstedten             26.940
Neuenfelde              17.076
Alsterdorf              24.000
Ottensen                35.196
Lurup                   21.600
Borgfelde               24.828
Altona-Nord             32.220
Wellingsbüttel          23.148
Finkenwerder            30.000
Hammerbrook             26.112
Osdorf                  25.932
Tonndorf                19.728
Dulsberg                19.620
Hoheluft-Ost            26.292
Sternschanze            28.140
Bergstedt               18.396
Hamm-Süd                23.964
Rönneburg               15.000
Duvenstedt              17.076
Hausbruch               18.000
Allermöhe               14.460
Veddel                  13.500
Rothenburgsort          20.604
Cranz                   10.416
Tatenberg               13.356
Langenbek               13.092
Lemsahl-Mellingstedt    18.636
Altengamme              13.200
Steinwerder             18.780
Spadenland              14.304
Sinstorf                16.800
Ochsenwerder            16.620
Kirchwerder             13.824
Francop                  9.660
Neuengamme              13.284
Curslack                14.244
Reitbrook               13.200
Billwerder              15.996
Moorburg                11.196
Billbrook               11.400
Moorfleet               12.000
Name: rent_per_square_meter, dtype: float64

Quarter names and everything looks fine, lets check the same with a box plot.

In [40]:
df.pivot(columns='quarter', values='rent_per_square_meter').iplot(
        kind='box',
        yTitle='Rent Per Square Meter',
        title='Quarter and Rent')

The maximum rent is 35.496 in the Marienthal quarter followed by hafencity with 35.4. Cranz has the maximum of 10.416, median 9.816 and minimum 8.952.

In [41]:
# Group by city and quarter just to simply check the names. 
pd.set_option('display.max_rows', 500)
df.groupby(('city'), sort=False)['quarter'].max()
Out[41]:
city
Hamburg                                        Wohldorf-Ohlstedt
Hamburg Niendorf                                        Niendorf
Hamburg Marienthal                                      Wandsbek
Hamburg Jenfeld                                          Jenfeld
Hamburg Schnelsen                                      Schnelsen
Hamburg Eppendorf                                   Hoheluft-Ost
Hamburg HafenCity                                      HafenCity
Eidelstedt                                            Eidelstedt
Neumünster                                              Ohlsdorf
Hamburg Klein Flottbek                               Othmarschen
Hamburg Winterhude                                    Winterhude
Hamburg Harvestehude                                  Rotherbaum
Hamburg hummelsbüttel                              Hummelsbüttel
Hamburg (HH-Stellingen)                               Stellingen
Uhlenhorst                                            Uhlenhorst
Hamburg Heimfeld                                        Heimfeld
Hamburg Barmbek                                      Barmbek-Süd
Hamburg Lohbrügge                                      Lohbrügge
Rissen                                                    Rissen
Hamburg Hafencity                                      HafenCity
Hamburg Lokstedt                                        Lokstedt
Hamburg Ojendorf/Billstedt                             Billstedt
Hamburg Bahrenfeld                                 Groß Flottbek
Hamburg Rahlstedt                                      Rahlstedt
Hamburg Sasel                                              Sasel
Hamburg Altona                                          Ottensen
Hamburg Farmsen-Berne                              Farmsen-Berne
Hamburg Curslack                                        Curslack
Hamburg Horn                                                Horn
Hamburg St. Pauli                                      St. Pauli
Hamburg Bergedorf                                      Lohbrügge
Hamburg Sankt Pauli                                    St. Pauli
Hamburg Hoheluft                                   Hoheluft-West
Hamburg Barmbek Nord                                Barmbek-Nord
Hamburg Barmbek-Nord                                Barmbek-Nord
Hamburg Ottensen                                        Ottensen
Hamburg Hamm                                           Hamm-Nord
Hamburg Eimsbüttel                                    Eimsbüttel
Hamburg Cranz                                              Cranz
Hamburg Poppenbüttel                                Poppenbüttel
Hamburg Alsterdorf                                    Alsterdorf
Hamburg Altona-Altstadt                              Altona-Nord
Hamburg Wellingsbüttel                            Wellingsbüttel
Hamburg Marmstorf                                      Marmstorf
Hamburg Wandsbek                                        Wandsbek
Hamburg Neugraben                             Neugraben-Fischbek
Hamburg Finkenwerder                                Finkenwerder
Hamburg HafenCity, Osakaallee 2                      Steinwerder
Hamburg Rissen                                            Rissen
Hamburg Farmsen                                    Farmsen-Berne
Hamburg Lurup                                              Lurup
Hamburg Brahmfeld                                       Bramfeld
Hamburg Bramfeld                                        Bramfeld
Winterhude                                            Winterhude
Hamburg Tonndorf                                        Tonndorf
Hamburg Eilbek                                            Eilbek
Hamburg Hausbruch                                      Hausbruch
Hamburg Rotherbaum                                    Rotherbaum
St. Pauli                                              St. Pauli
Hamburg Eidelstedt                                    Eidelstedt
Hamburg Hamm-Nord                                      Hamm-Nord
Hamburg Altona-Nord                                     Ottensen
Hamburg Langenhorn                                    Langenhorn
Hamburg Dulsberg                                        Dulsberg
Hamburg Wilhelmsburg                                Wilhelmsburg
Hamburg Barmbek-Süd                                   Uhlenhorst
Hamburg Uhlenhorst                                    Uhlenhorst
Hamburg Hamburg                                          Harburg
Hamburg Ochsenwerder                                Ochsenwerder
Hamburg (Groß Flottbek)                            Groß Flottbek
Hamburg Alt Rahlstedt                                  Rahlstedt
Hamburg Fischbek                              Neugraben-Fischbek
Hamburg OT Lemsahl-Mellingstedt             Lemsahl-Mellingstedt
Hamburg Neustadt                                        Neustadt
Hamburg Groß Flottbek                              Groß Flottbek
Hamburg Borgfelde                                      Borgfelde
Barmbek-Nord                                        Barmbek-Nord
Hamburg Wilstorf                                        Wilstorf
Hamburg Iserbrook                                       Sülldorf
Hamburg Eißendorf                                        Harburg
Uhlenhorst Hamburg                                    Uhlenhorst
Wilhelmsburg                                        Wilhelmsburg
Braak                                                  Rahlstedt
Hamburg Wohldorf-Ohlstedt                      Wohldorf-Ohlstedt
Hamburg Kirchwerder                                  Kirchwerder
Hamburg (Tonndorf)                                       Jenfeld
Hamburg (Schierenberg)                                 Rahlstedt
Hamburg Stellingen                                    Stellingen
HamburgBergedorf                                       Bergedorf
Hoheluft-West                                      Hoheluft-West
Hamburg Stelling                                      Stellingen
Hamburg Groß Borstel                                    Niendorf
Hamburg Meiendorf                                      Rahlstedt
HamburgBillstedt                                       Billstedt
Hamburg Hohenfelde                                    Uhlenhorst
Hamburg Ohlsdorf                                        Ohlsdorf
Hamburg RAHLSTEDT/ OLDENFELDE                          Rahlstedt
Hamburg Bergstedt                                      Bergstedt
Hamburg Othmarschen                                  Othmarschen
Hamburg Fuhlsbüttel                                 Groß Borstel
Hamburg Neugraben Fischbek                    Neugraben-Fischbek
Hamburg Blankenese                                   Nienstedten
Neugraben - Fischbek                          Neugraben-Fischbek
Hamburg Rahlstedt Meiendorf                            Rahlstedt
Hamburg Ohlstedt                               Wohldorf-Ohlstedt
Hamburg Hummelsbüttel                              Hummelsbüttel
Hamburg (Lokstedt)                                      Lokstedt
St. Georg                                            Hammerbrook
Jenfeld                                                  Jenfeld
Hamburg St. Georg                                      St. Georg
Hamburg                                               Winterhude
Borgfelde                                              Borgfelde
Nienstedten                                          Nienstedten
Hamburg Dockenhuden                                   Blankenese
Hamburg Rothenburgsort                            Rothenburgsort
Hummelsbüttel                                      Hummelsbüttel
Hamburg Allermöhe                                      Allermöhe
Hamburg Hammerbrook                                  Hammerbrook
Hamburg Volksdorf                                      Volksdorf
Hamburg OT Bahrenfeld                                 Bahrenfeld
Neugraben-Fischbek                            Neugraben-Fischbek
Farmsen-Berne                                          Rahlstedt
Rahlstedt-Zentrum                                      Rahlstedt
Hamburg (Finkenwerder)                              Finkenwerder
Hamburg Altona/St.Pauli                                St. Pauli
Iserbrook                                              Iserbrook
Hamburg Alt-Osdorf                                        Osdorf
Hamburg Winterude Forsmannstr.                        Winterhude
Hamburg Alt Rahlstedt/Oldenfelde                       Rahlstedt
Bergedorf                                              Bergedorf
Lemsahl-Mellingstedt                        Lemsahl-Mellingstedt
Hamburg Sinstorf                                        Sinstorf
Hamburg Neuwiedenthal                                  Hausbruch
Bahrenfeld                                            Bahrenfeld
Hamburg Billstedt                                      Billstedt
Hamburg Marmsdorf                                      Marmstorf
Hamburg Hoheluft-West                              Hoheluft-West
Ochsenwerder                                        Ochsenwerder
Rahlstedt                                              Rahlstedt
HamburgKlein Flottbek                                Nienstedten
Hamburg MEIENDORF                                      Rahlstedt
Bramfeld                                                Bramfeld
Hamburg Barmbek Süd                                  Barmbek-Süd
Othmarschen                                          Othmarschen
Hamburg Grindelhof                                    Rotherbaum
Hamburg Hummelsbüttel / Fuhlsbüttel Nord           Hummelsbüttel
Hamburg Altstadt                                  Hamb.-Altstadt
Hamburg bramfeld                                        Bramfeld
Hamburg Osdorf                                            Osdorf
Hamburg Hoheluft-Ost                                Hoheluft-Ost
Hamburg Duhlsberg                                       Dulsberg
Hamburg Duvenstedt                                    Duvenstedt
Hamburg (Eimsbüttel)                                  Eimsbüttel
Hamburg Mümmelmannsberg                                Billstedt
Barmbek                                             Barmbek-Nord
Harvestehude                                        Harvestehude
Hamburg WInterhude                                    Winterhude
Lokstedt                                                Lokstedt
Hamburg HafenCity, Singapurstraße 19                 Steinwerder
Hamburg Nienstedten                                  Nienstedten
Hamburg Berne                                          Rahlstedt
Hamburg Rönneburg                                      Rönneburg
Hamburg Altona (Lurup)                                     Lurup
Volksdorf                                              Volksdorf
Hamburg(OT: Rönneburg)                                  Wilstorf
Hamburg (Alt-Osdorf)                                      Osdorf
Hamburg Binnenhafen                                      Harburg
Hamburg St Pauli                                       St. Pauli
Hamburg Hoheluft Ost                                Hoheluft-Ost
Hamburg Eppedorf                                       Eppendorf
HamburgHamburg                                           Harburg
Hamburg Allermöhe 21037                              Kirchwerder
Eppendorf                                              Eppendorf
Negraben-Fischbek                             Neugraben-Fischbek
Hamburg Hamburg Altona-Altstadt                  Altona-Altstadt
Hamburg Vierlanden                                    Neuengamme
HH-Hoheluft-West                                   Hoheluft-West
Barmbel Büd                                          Barmbek-Süd
Hamburg Moorfleet                                      Moorfleet
Hamburg Sülldorf                                        Sülldorf
Hamburg (Barmbek-Süd)                                Barmbek-Süd
Moorfleet                                              Allermöhe
Hamburg Sankt Georg                                    St. Georg
Hamburg (Othmarschen)                                Othmarschen
Name: quarter, dtype: object
In [42]:
# rent per square meter with the poscode
pd.set_option('display.max_rows', 500)
df.groupby(('postcode'), sort=True)['rent_per_square_meter'].max()
Out[42]:
postcode
2103     15.288
20085    22.764
20095    25.536
20097    26.112
20099    31.920
20144    30.000
20146    33.924
20148    26.400
20149    35.028
20173    29.640
20175    10.284
20179    14.688
20235    17.364
20249    28.800
20251    32.004
20253    30.000
20255    22.968
20257    25.824
20259    28.500
20354    26.076
20355    29.544
20357    32.640
20359    27.540
20457    35.400
20459    29.448
20529    16.344
20535    24.828
20537    23.964
20539    25.128
20839    14.880
20881    14.184
20999    17.292
21029    26.184
21031    17.328
21033    15.996
21035    17.772
21037    16.620
21039    14.244
21073    32.280
21075    33.348
21077    27.144
21079    28.584
21097    19.812
21107    18.756
21109    32.712
21129    30.000
21147    16.272
21149    20.004
21217    15.996
21703    10.896
22030    15.552
22041    35.496
22043    21.816
22045    17.652
22047    20.928
22049    19.620
22071    15.144
22081    30.000
22083    27.468
22085    35.220
22087    33.792
22089    27.276
22111    20.040
22113    12.000
22115    13.800
22117    16.680
22119    16.788
22133     8.676
22143    19.236
22145    21.336
22147    15.600
22149    19.164
22159    18.168
22175    16.008
22177    19.080
22179    27.996
22229    12.768
22253    15.780
22296    20.556
22297    24.000
22299    33.132
22301    31.584
22303    31.872
22305    29.196
22307    32.004
22309    23.808
22335    29.604
22337    18.612
22339    20.856
22341    11.352
22350    16.740
22359    19.500
22391    23.148
22393    19.200
22395    18.396
22397    19.200
22399    17.712
22415    20.004
22417    16.776
22419    19.416
22453    33.756
22455    18.840
22457    18.180
22459    24.000
22523    18.264
22525    27.000
22527    30.672
22529    26.400
22532    14.100
22547    17.688
22549    25.932
22559    23.460
22572    16.164
22587    25.308
22589    28.128
22605    27.024
22607    26.940
22609    22.200
22647    14.400
22719    16.608
22761    30.492
22763    31.536
22765    35.196
22767    30.636
22769    27.456
22800    14.088
22801    10.200
22807    18.000
22999    15.228
23564    15.828
27661    17.400
Name: rent_per_square_meter, dtype: float64

The maimum rent in the list is showing in the postcode 20457 with 35.400. We know that the maimum rent of 35.4 is in Hamburg or the Hafencity. Lets check the where this zipcode belongs to.

In [43]:
pd.set_option('display.max_rows', 500)
df.groupby(('postcode'), sort=True)['city'].max()
Out[43]:
postcode
2103                           Hamburg
20085                          Hamburg
20095                          Hamburg
20097                        St. Georg
20099                Hamburg St. Georg
20144            Hamburg Hoheluft-West
20146               Hamburg Rotherbaum
20148               Hamburg Rotherbaum
20149                     Harvestehude
20173                          Hamburg
20175                          Hamburg
20179                          Hamburg
20235                     Hamburg Hamm
20249             Hamburg Harvestehude
20251             Hamburg Hoheluft-Ost
20253                    Hoheluft-West
20255               Hamburg Eimsbüttel
20257               Hamburg Eimsbüttel
20259               Hamburg Eimsbüttel
20354               Hamburg Rotherbaum
20355                 Hamburg Neustadt
20357               Hamburg Eimsbüttel
20359                        St. Pauli
20457                Hamburg Hafencity
20459                 Hamburg Neustadt
20529                          Hamburg
20535                Hamburg Hamm-Nord
20537                     Hamburg Hamm
20539           Hamburg Rothenburgsort
20839                          Hamburg
20881                          Hamburg
20999                          Hamburg
21029                 HamburgBergedorf
21031                Hamburg Lohbrügge
21033                Hamburg Lohbrügge
21035                Hamburg Bergedorf
21037                     Ochsenwerder
21039               Hamburg Vierlanden
21073                   HamburgHamburg
21075                 Hamburg Heimfeld
21077                 Hamburg Wilstorf
21079           Hamburg(OT: Rönneburg)
21097                          Hamburg
21107                     Wilhelmsburg
21109                     Wilhelmsburg
21129             Hamburg Finkenwerder
21147            Hamburg Neuwiedenthal
21149               Neugraben-Fischbek
21217                          Hamburg
21703                          Hamburg
22030                          Hamburg
22041                 Hamburg Wandsbek
22043               Hamburg Marienthal
22045                          Jenfeld
22047                 Hamburg Wandsbek
22049                 Hamburg Wandsbek
22071                          Hamburg
22081                       Uhlenhorst
22083                       Winterhude
22085               Uhlenhorst Hamburg
22087                       Uhlenhorst
22089                 Hamburg Wandsbek
22111                     Hamburg Horn
22113                        Moorfleet
22115          Hamburg Mümmelmannsberg
22117                 HamburgBillstedt
22119                     Hamburg Horn
22133                          Hamburg
22143                        Rahlstedt
22145      Hamburg Rahlstedt Meiendorf
22147    Hamburg RAHLSTEDT/ OLDENFELDE
22149                Rahlstedt-Zentrum
22159            Hamburg Farmsen-Berne
22175                 Hamburg Bramfeld
22177                 Hamburg bramfeld
22179                 Hamburg Bramfeld
22229                          Hamburg
22253                          Hamburg
22296                          Hamburg
22297               Hamburg Winterhude
22299               Hamburg Winterhude
22301               Hamburg Winterhude
22303                       Winterhude
22305             Hamburg Barmbek-Nord
22307             Hamburg Barmbek-Nord
22309                          Hamburg
22335                       Neumünster
22337                 Hamburg Ohlsdorf
22339                    Hummelsbüttel
22341                          Hamburg
22350                          Hamburg
22359                        Volksdorf
22391           Hamburg Wellingsbüttel
22393                    Hamburg Sasel
22395                    Hamburg Sasel
22397             Lemsahl-Mellingstedt
22399             Hamburg Poppenbüttel
22415               Hamburg Langenhorn
22417               Hamburg Langenhorn
22419               Hamburg Langenhorn
22453                 Hamburg Niendorf
22455                 Hamburg Niendorf
22457                Hamburg Schnelsen
22459                Hamburg Schnelsen
22523               Hamburg Eidelstedt
22525               Hamburg Stellingen
22527               Hamburg Stellingen
22529                         Lokstedt
22532                          Hamburg
22547                    Hamburg Lurup
22549                   Hamburg Osdorf
22559                           Rissen
22572                          Hamburg
22587              Hamburg Dockenhuden
22589                        Iserbrook
22605                      Othmarschen
22607            Hamburg OT Bahrenfeld
22609                      Nienstedten
22647                          Hamburg
22719                          Hamburg
22761               Hamburg Bahrenfeld
22763                 Hamburg Ottensen
22765                 Hamburg Ottensen
22767                Hamburg St. Pauli
22769              Hamburg Altona-Nord
22800                          Hamburg
22801                          Hamburg
22807                          Hamburg
22999                          Hamburg
23564                          Hamburg
27661                          Hamburg
Name: city, dtype: object

From the above list we can see that there is 2103 postcode of 4 digits which seems wrong. We will remove that. Postcode 20457 belongs to Hafencity. We dont know where this post code belongs to, the possibilities could be:
21031 - Hamburg Lohbrügge
21033 - Hamburg Lohbrügge
21035 - Hamburg Bergedorf
21039 - Hamburg Vierlanden
We will remove this from the dataset.

In [44]:
df = df[df.postcode != 2103]
In [45]:
pd.set_option('display.max_rows', 500)
df.groupby(('postcode'), sort=True)['city'].max()
Out[45]:
postcode
20085                          Hamburg
20095                          Hamburg
20097                        St. Georg
20099                Hamburg St. Georg
20144            Hamburg Hoheluft-West
20146               Hamburg Rotherbaum
20148               Hamburg Rotherbaum
20149                     Harvestehude
20173                          Hamburg
20175                          Hamburg
20179                          Hamburg
20235                     Hamburg Hamm
20249             Hamburg Harvestehude
20251             Hamburg Hoheluft-Ost
20253                    Hoheluft-West
20255               Hamburg Eimsbüttel
20257               Hamburg Eimsbüttel
20259               Hamburg Eimsbüttel
20354               Hamburg Rotherbaum
20355                 Hamburg Neustadt
20357               Hamburg Eimsbüttel
20359                        St. Pauli
20457                Hamburg Hafencity
20459                 Hamburg Neustadt
20529                          Hamburg
20535                Hamburg Hamm-Nord
20537                     Hamburg Hamm
20539           Hamburg Rothenburgsort
20839                          Hamburg
20881                          Hamburg
20999                          Hamburg
21029                 HamburgBergedorf
21031                Hamburg Lohbrügge
21033                Hamburg Lohbrügge
21035                Hamburg Bergedorf
21037                     Ochsenwerder
21039               Hamburg Vierlanden
21073                   HamburgHamburg
21075                 Hamburg Heimfeld
21077                 Hamburg Wilstorf
21079           Hamburg(OT: Rönneburg)
21097                          Hamburg
21107                     Wilhelmsburg
21109                     Wilhelmsburg
21129             Hamburg Finkenwerder
21147            Hamburg Neuwiedenthal
21149               Neugraben-Fischbek
21217                          Hamburg
21703                          Hamburg
22030                          Hamburg
22041                 Hamburg Wandsbek
22043               Hamburg Marienthal
22045                          Jenfeld
22047                 Hamburg Wandsbek
22049                 Hamburg Wandsbek
22071                          Hamburg
22081                       Uhlenhorst
22083                       Winterhude
22085               Uhlenhorst Hamburg
22087                       Uhlenhorst
22089                 Hamburg Wandsbek
22111                     Hamburg Horn
22113                        Moorfleet
22115          Hamburg Mümmelmannsberg
22117                 HamburgBillstedt
22119                     Hamburg Horn
22133                          Hamburg
22143                        Rahlstedt
22145      Hamburg Rahlstedt Meiendorf
22147    Hamburg RAHLSTEDT/ OLDENFELDE
22149                Rahlstedt-Zentrum
22159            Hamburg Farmsen-Berne
22175                 Hamburg Bramfeld
22177                 Hamburg bramfeld
22179                 Hamburg Bramfeld
22229                          Hamburg
22253                          Hamburg
22296                          Hamburg
22297               Hamburg Winterhude
22299               Hamburg Winterhude
22301               Hamburg Winterhude
22303                       Winterhude
22305             Hamburg Barmbek-Nord
22307             Hamburg Barmbek-Nord
22309                          Hamburg
22335                       Neumünster
22337                 Hamburg Ohlsdorf
22339                    Hummelsbüttel
22341                          Hamburg
22350                          Hamburg
22359                        Volksdorf
22391           Hamburg Wellingsbüttel
22393                    Hamburg Sasel
22395                    Hamburg Sasel
22397             Lemsahl-Mellingstedt
22399             Hamburg Poppenbüttel
22415               Hamburg Langenhorn
22417               Hamburg Langenhorn
22419               Hamburg Langenhorn
22453                 Hamburg Niendorf
22455                 Hamburg Niendorf
22457                Hamburg Schnelsen
22459                Hamburg Schnelsen
22523               Hamburg Eidelstedt
22525               Hamburg Stellingen
22527               Hamburg Stellingen
22529                         Lokstedt
22532                          Hamburg
22547                    Hamburg Lurup
22549                   Hamburg Osdorf
22559                           Rissen
22572                          Hamburg
22587              Hamburg Dockenhuden
22589                        Iserbrook
22605                      Othmarschen
22607            Hamburg OT Bahrenfeld
22609                      Nienstedten
22647                          Hamburg
22719                          Hamburg
22761               Hamburg Bahrenfeld
22763                 Hamburg Ottensen
22765                 Hamburg Ottensen
22767                Hamburg St. Pauli
22769              Hamburg Altona-Nord
22800                          Hamburg
22801                          Hamburg
22807                          Hamburg
22999                          Hamburg
23564                          Hamburg
27661                          Hamburg
Name: city, dtype: object

Now its clean with names. We can save the cleaned data.

In [46]:
# Save cleaned dataset
cleaned_data = df.to_csv(r'rent_cleaned.csv', index=id, header=True)
In [47]:
# Read dataset
df = pd.read_csv('rent_cleaned.csv', index_col=0)
df.head()
Out[47]:
living_space rooms cold_rent quarter city postcode rent_per_square_meter publish_date latitude longitude
0 116.4 4.5 1453.2 Farmsen-Berne Hamburg 22159 14.976 2019-05-30 22:00:00 53.597793 10.121997
1 78.0 3.0 819.6 Wandsbek Hamburg 22047 12.612 2019-01-31 23:00:00 53.586691 10.086159
2 62.4 3.5 504.0 Wilstorf Hamburg 21079 9.696 2019-01-31 23:00:00 53.444908 9.996353
3 98.4 3.5 1131.6 Rahlstedt Hamburg 22145 13.800 2019-04-30 22:00:00 53.624685 10.145269
4 144.0 3.5 2280.0 Blankenese Hamburg 22587 18.996 2019-03-28 23:00:00 53.559652 9.799682
In [48]:
df.isnull().sum()
Out[48]:
living_space             0
rooms                    0
cold_rent                0
quarter                  0
city                     0
postcode                 0
rent_per_square_meter    0
publish_date             0
latitude                 0
longitude                0
dtype: int64

let's do some analysis and plot some graphs to understand the dataset better. let's start with the rent price range and do some more analysis and

In [49]:
print(f"Advertised prices range from €{min(df.rent_per_square_meter)} to €{max(df.rent_per_square_meter)}.")
Advertised prices range from €6.0 to €35.496.
In [50]:
# Distribution of prices from €6.0 to €35.496
plt.figure(figsize=(20,4))
df.rent_per_square_meter.hist(bins=10, range=(0,100))
plt.margins(x=0)
plt.axvline(15, color='blue', linestyle='--')
plt.title("Rent Prices Per Square Meter", fontsize=16)
plt.xlabel("Price Per SQ Meter(€)")
plt.ylabel("Number of listings")
plt.show()
In [51]:
print(f"Advertised prices range from €{min(df.cold_rent)} to €{max(df.cold_rent)}.")
Advertised prices range from €159.6 to €4788.0.
In [52]:
# Distribution of prices from €159.6 to €4788.0
plt.figure(figsize=(20,4))
df.cold_rent.hist(bins=100, range=(159,4788))
plt.margins(x=0)
plt.axvline(1000, color='blue', linestyle='--')
plt.title("Cold Rent Prices", fontsize=16)
plt.xlabel("Price Cold rent(€)")
plt.ylabel("Number of listings")
plt.show()
In [53]:
# Distribution of prices between €1000 and €2000 
plt.figure(figsize=(20,4))
df.cold_rent.hist(bins=100, range=(159,4788))
plt.margins(x=0)
plt.axvline(1000, color='green', linestyle='--')
plt.axvline(2000, color='red', linestyle='--')
plt.title("Cold Rent up to €1000", fontsize=16)
plt.xlabel("Price (€)")
plt.ylabel("Number of listings")
plt.show()

We can see that there are more apartments for rent under 1000 Euros whereas less number of apartments for rent above 1000 Euros. After 2000 Euros there are few apartments left for the rent.

Which is the most common house (Bedroom wise) ?

In [54]:
df['rooms'].value_counts().plot(kind='bar')
plt.title("Number of Rooms")
plt.xlabel("Rooms")
plt.ylabel("Count")
Out[54]:
Text(0, 0.5, 'Count')

As we can see from the visualization 2.5 bedroom houses are most common for the rent followed by 3.5 bedroom.

In [55]:
df['quarter'].value_counts().iplot(
        kind='bar',
        yTitle='Most Apartments for Rent in which Quarter',
        title='Quarter')

Winterhude has the highest number of apartments for the rent.

How common factors are affecting the price of the houses ?

What factors affecting the prices of the house? Let us start with , If price is getting affecting by living area of the house or not ?

In [56]:
plt.scatter(df.cold_rent, df.living_space)
plt.title("Cold Rent Vs Living Space")
plt.xlabel("Cold Rent")
plt.ylabel("Living SPace")
plt.show()

The scatter plot above shows that the data points are in the linear direction and it clearly shows that if living area increases the price increases.

In [57]:
plt.scatter(df.cold_rent, df.rooms)
plt.title("Rooms and Price")
plt.xlabel("Cold Rent")
plt.ylabel("Rooms")
plt.show()
In [58]:
plt.scatter(df.postcode, df.cold_rent)
plt.title("price according to the location")
Out[58]:
Text(0.5, 1.0, 'price according to the location')

From the above figure, we can see that there are some wrong postcodes in the dataset as well. It looks like Hamburg postcode is between 20000 to 23000 but there are some wrong zipcodes.

Does published date affect the rent?

In [59]:
df.pivot(columns='publish_date', values='rent_per_square_meter').iplot(
        kind='box',
        yTitle='Rent Per Square Meter',
        title='Published Dates and Rent')

Yes, it does look like newly published apartments are a little higher than the old published dates. Before July 2016 until Jan 2017 the property rents are low but after July 2017 the rents are almost the same. We could also do the same with the construction year but unfortunately, we had to get rid of that because of so many missing values.

In [60]:
df.pivot(columns='rooms', values='cold_rent').iplot(
        kind='box',
        yTitle='Rent Per Square Meter',
        title='Rooms and Rent')

4. Geo-Data extract from Geofabrik

"http://download.geofabrik.de/europe/germany/hamburg.html". This website is an open portal and has data extracts from the OpenStreetMap project which are normally updated every day. Select your continent and then your country of interest from the list below. This open data download service is offered free of charge by Geofabrik GmbH. I downloaded the shapefile of venues for Hamburg and then the shapefile is read and converted to GeoJSON. We converted to GeoJSON to get the geometry out of the shapefile.

In [61]:
# Importing the shape file of the venues and converting to GeoJSON
import geopandas
myshpfile = geopandas.read_file('shape_file/gis_osm_pois_free_1.shp')
myshpfile.to_file('myJson.geojson', driver='GeoJSON')
In [62]:
# Importing the city venues or places as a GeoJSON file as a dataframe in geopandas
df2 = gpd.read_file(r'myJson.geojson')
df2.head()
Out[62]:
osm_id code fclass name geometry
0 15364587 2032 recycling_paper None POINT (10.02632 53.62422)
1 15839230 2205 playground None POINT (9.98063 53.44816)
2 21545142 2205 playground None POINT (9.87625 53.59130)
3 25149456 2007 library Bücherhalle Barmbek POINT (10.04447 53.58624)
4 25699152 2101 pharmacy Markt Apotheke Neugraben POINT (9.85287 53.47095)
In [63]:
# Check the missing values
df2.isna().sum()
Out[63]:
osm_id          0
code            0
fclass          0
name        15604
geometry        0
dtype: int64

Now we need to concatenate the venue dataset and the rent dataset. For that, both datasets should have point geometry. So, we will create geometry for a rent dataset from its latitude and longitude

In [64]:
df3 = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude))
In [65]:
df3.head()
Out[65]:
living_space rooms cold_rent quarter city postcode rent_per_square_meter publish_date latitude longitude geometry
0 116.4 4.5 1453.2 Farmsen-Berne Hamburg 22159 14.976 2019-05-30 22:00:00 53.597793 10.121997 POINT (10.12200 53.59779)
1 78.0 3.0 819.6 Wandsbek Hamburg 22047 12.612 2019-01-31 23:00:00 53.586691 10.086159 POINT (10.08616 53.58669)
2 62.4 3.5 504.0 Wilstorf Hamburg 21079 9.696 2019-01-31 23:00:00 53.444908 9.996353 POINT (9.99635 53.44491)
3 98.4 3.5 1131.6 Rahlstedt Hamburg 22145 13.800 2019-04-30 22:00:00 53.624685 10.145269 POINT (10.14527 53.62468)
4 144.0 3.5 2280.0 Blankenese Hamburg 22587 18.996 2019-03-28 23:00:00 53.559652 9.799682 POINT (9.79968 53.55965)

Now to concatenate the two datasets we need the nearest neighbour so that we can get the venues which are nearby to the apartment. I will use scipy's cKDTree spatial index.query method which returns very fast results for nearest neighbour searches. As it uses a spatial index it's orders of magnitude faster than looping through the data frame and then finding the minimum of all distances. It is also faster than using shapely's nearest_points with RTree (the spatial index method available via geopandas) because cKDTree allows you to vectorize your search whereas the other method does not.

Here is a helper function that will return the distance and 'Name' of the nearest neighbour from our external venues dataset to each point in our Condo Rent dataset. It assumes both datasets have a geometry column of points.

Not only we get the nearest venue or place to the apartment we will get the distance in meters as well. Later we can also convert the meters in Kilometers.

In [66]:
from scipy.spatial import cKDTree
from shapely.geometry import Point


def ckdnearest(gdA, gdB):
    nA = np.array(list(gdA.geometry.apply(lambda x: (x.x, x.y))))
    nB = np.array(list(gdB.geometry.apply(lambda x: (x.x, x.y))))
    btree = cKDTree(nB)
    dist, idx = btree.query(nA, k=1)
    gdf = pd.concat(
        [gdA.reset_index(drop=True), gdB.loc[idx, gdB.columns != 'geometry'].reset_index(drop=True),
         pd.Series(dist, name='dist')], axis=1)
    return gdf

ckdnearest(df3, df2)
Out[66]:
living_space rooms cold_rent quarter city postcode rent_per_square_meter ... longitude geometry osm_id code fclass name dist
0 116.40 4.5 1453.200 Farmsen-Berne Hamburg 22159 14.976 ... 10.121997 POINT (10.12200 53.59779) 1190663105 2205 playground None 0.002903
1 78.00 3.0 819.600 Wandsbek Hamburg 22047 12.612 ... 10.086159 POINT (10.08616 53.58669) 4100013663 2514 chemist Budnikowsky 0.000321
2 62.40 3.5 504.000 Wilstorf Hamburg 21079 9.696 ... 9.996353 POINT (9.99635 53.44491) 3517456812 2724 memorial Berthold Bormann 0.000318
3 98.40 3.5 1131.600 Rahlstedt Hamburg 22145 13.800 ... 10.145269 POINT (10.14527 53.62468) 6348457833 2593 vending_any None 0.000537
4 144.00 3.5 2280.000 Blankenese Hamburg 22587 18.996 ... 9.799682 POINT (9.79968 53.55965) 308012942 2902 bench None 0.000655
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12405 78.84 3.5 684.000 Rahlstedt Hamburg 22147 10.416 ... 10.130409 POINT (10.13041 53.60460) 942054400 2303 cafe Kappa Due 0.001595
12406 60.00 2.5 930.000 Barmbek-Süd Hamburg 22083 18.600 ... 10.029646 POINT (10.02965 53.57566) 6028150307 2303 cafe LORDS Deli 0.000320
12407 92.94 2.5 606.000 St. Pauli Hamburg 20359 7.824 ... 9.960613 POINT (9.96061 53.54918) 1887413582 2305 bar Quer Club 0.000261
12408 69.54 2.5 748.200 Winterhude Hamburg 22303 12.912 ... 10.018847 POINT (10.01885 53.58979) 693551445 2724 memorial Samuel Levy 0.001469
12409 73.20 3.0 518.532 Heimfeld Hamburg 21075 8.496 ... 9.965728 POINT (9.96573 53.46221) 3447246195 2724 memorial Paul R. Reinke 0.000446

12410 rows × 16 columns

In [67]:
ckdnearest(df3, df2).head()
Out[67]:
living_space rooms cold_rent quarter city postcode rent_per_square_meter ... longitude geometry osm_id code fclass name dist
0 116.4 4.5 1453.2 Farmsen-Berne Hamburg 22159 14.976 ... 10.121997 POINT (10.12200 53.59779) 1190663105 2205 playground None 0.002903
1 78.0 3.0 819.6 Wandsbek Hamburg 22047 12.612 ... 10.086159 POINT (10.08616 53.58669) 4100013663 2514 chemist Budnikowsky 0.000321
2 62.4 3.5 504.0 Wilstorf Hamburg 21079 9.696 ... 9.996353 POINT (9.99635 53.44491) 3517456812 2724 memorial Berthold Bormann 0.000318
3 98.4 3.5 1131.6 Rahlstedt Hamburg 22145 13.800 ... 10.145269 POINT (10.14527 53.62468) 6348457833 2593 vending_any None 0.000537
4 144.0 3.5 2280.0 Blankenese Hamburg 22587 18.996 ... 9.799682 POINT (9.79968 53.55965) 308012942 2902 bench None 0.000655

5 rows × 16 columns

In [68]:
# Save the merged dataset
cleaned_merged_data = ckdnearest(df3, df2).to_csv(r'ckdnearest(df3, df2).csv', index=id, header=True)
In [69]:
# Read dataset
df = pd.read_csv('ckdnearest(df3, df2).csv')
In [70]:
df.head()
Out[70]:
Unnamed: 0 living_space rooms cold_rent quarter city postcode ... longitude geometry osm_id code fclass name dist
0 0 116.4 4.5 1453.2 Farmsen-Berne Hamburg 22159 ... 10.121997 POINT (10.12199735210979 53.59779278086435) 1190663105 2205 playground NaN 0.002903
1 1 78.0 3.0 819.6 Wandsbek Hamburg 22047 ... 10.086159 POINT (10.0861590022222 53.58669136622085) 4100013663 2514 chemist Budnikowsky 0.000321
2 2 62.4 3.5 504.0 Wilstorf Hamburg 21079 ... 9.996353 POINT (9.996352831413693 53.44490810458736) 3517456812 2724 memorial Berthold Bormann 0.000318
3 3 98.4 3.5 1131.6 Rahlstedt Hamburg 22145 ... 10.145269 POINT (10.14526927662929 53.62468452926397) 6348457833 2593 vending_any NaN 0.000537
4 4 144.0 3.5 2280.0 Blankenese Hamburg 22587 ... 9.799682 POINT (9.799681528125992 53.55965182828339) 308012942 2902 bench NaN 0.000655

5 rows × 17 columns

In [71]:
df.drop(['Unnamed: 0'], axis=1, inplace=True)
In [72]:
df.isna().sum()
Out[72]:
living_space                0
rooms                       0
cold_rent                   0
quarter                     0
city                        0
postcode                    0
rent_per_square_meter       0
publish_date                0
latitude                    0
longitude                   0
geometry                    0
osm_id                      0
code                        0
fclass                      0
name                     5509
dist                        0
dtype: int64

Only name column has empty data, we will keep this to check the names of the venues and later we can remove it at the time of prediction.

We have successfully concatenated our dataset and got the distance from the apartment to the venue as well. Now for the sake of simplicity, we can convert the distance in meters into kilometres.

In [73]:
# Convert Meters to Kilometers
df['dist'] = df['dist'].div(1000)
In [74]:
df.head()
Out[74]:
living_space rooms cold_rent quarter city postcode rent_per_square_meter ... longitude geometry osm_id code fclass name dist
0 116.4 4.5 1453.2 Farmsen-Berne Hamburg 22159 14.976 ... 10.121997 POINT (10.12199735210979 53.59779278086435) 1190663105 2205 playground NaN 2.903338e-06
1 78.0 3.0 819.6 Wandsbek Hamburg 22047 12.612 ... 10.086159 POINT (10.0861590022222 53.58669136622085) 4100013663 2514 chemist Budnikowsky 3.214926e-07
2 62.4 3.5 504.0 Wilstorf Hamburg 21079 9.696 ... 9.996353 POINT (9.996352831413693 53.44490810458736) 3517456812 2724 memorial Berthold Bormann 3.179704e-07
3 98.4 3.5 1131.6 Rahlstedt Hamburg 22145 13.800 ... 10.145269 POINT (10.14526927662929 53.62468452926397) 6348457833 2593 vending_any NaN 5.373154e-07
4 144.0 3.5 2280.0 Blankenese Hamburg 22587 18.996 ... 9.799682 POINT (9.799681528125992 53.55965182828339) 308012942 2902 bench NaN 6.548586e-07

5 rows × 16 columns

Now we can see the distance is in Kilometers. We just simply divided the distance with kilometres. Now we can see the distance between the apartment and the nearby venue or place.

5. Analysis of the city and the venue nearby

In [75]:
df.groupby('quarter').count()
Out[75]:
living_space rooms cold_rent city postcode rent_per_square_meter publish_date latitude longitude geometry osm_id code fclass name dist
quarter
Allermöhe 56 56 56 56 56 56 56 56 56 56 56 56 56 2 56
Alsterdorf 157 157 157 157 157 157 157 157 157 157 157 157 157 56 157
Altengamme 5 5 5 5 5 5 5 5 5 5 5 5 5 1 5
Altona-Altstadt 184 184 184 184 184 184 184 184 184 184 184 184 184 127 184
Altona-Nord 126 126 126 126 126 126 126 126 126 126 126 126 126 34 126
Bahrenfeld 172 172 172 172 172 172 172 172 172 172 172 172 172 108 172
Barmbek-Nord 401 401 401 401 401 401 401 401 401 401 401 401 401 216 401
Barmbek-Süd 324 324 324 324 324 324 324 324 324 324 324 324 324 224 324
Bergedorf 270 270 270 270 270 270 270 270 270 270 270 270 270 98 270
Bergstedt 25 25 25 25 25 25 25 25 25 25 25 25 25 5 25
Billbrook 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Billstedt 136 136 136 136 136 136 136 136 136 136 136 136 136 75 136
Billwerder 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Blankenese 138 138 138 138 138 138 138 138 138 138 138 138 138 67 138
Borgfelde 100 100 100 100 100 100 100 100 100 100 100 100 100 76 100
Bramfeld 256 256 256 256 256 256 256 256 256 256 256 256 256 128 256
Cranz 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3
Curslack 16 16 16 16 16 16 16 16 16 16 16 16 16 8 16
Dulsberg 122 122 122 122 122 122 122 122 122 122 122 122 122 82 122
Duvenstedt 26 26 26 26 26 26 26 26 26 26 26 26 26 15 26
Eidelstedt 233 233 233 233 233 233 233 233 233 233 233 233 233 58 233
Eilbek 172 172 172 172 172 172 172 172 172 172 172 172 172 106 172
Eimsbüttel 356 356 356 356 356 356 356 356 356 356 356 356 356 290 356
Eißendorf 160 160 160 160 160 160 160 160 160 160 160 160 160 79 160
Eppendorf 297 297 297 297 297 297 297 297 297 297 297 297 297 232 297
Farmsen-Berne 111 111 111 111 111 111 111 111 111 111 111 111 111 28 111
Finkenwerder 59 59 59 59 59 59 59 59 59 59 59 59 59 15 59
Francop 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Fuhlsbüttel 115 115 115 115 115 115 115 115 115 115 115 115 115 68 115
Groß Borstel 78 78 78 78 78 78 78 78 78 78 78 78 78 21 78
Groß Flottbek 73 73 73 73 73 73 73 73 73 73 73 73 73 30 73
HafenCity 166 166 166 166 166 166 166 166 166 166 166 166 166 116 166
Hamb.-Altstadt 121 121 121 121 121 121 121 121 121 121 121 121 121 63 121
Hamm-Mitte 86 86 86 86 86 86 86 86 86 86 86 86 86 52 86
Hamm-Nord 148 148 148 148 148 148 148 148 148 148 148 148 148 109 148
Hamm-Süd 29 29 29 29 29 29 29 29 29 29 29 29 29 15 29
Hammerbrook 125 125 125 125 125 125 125 125 125 125 125 125 125 91 125
Harburg 321 321 321 321 321 321 321 321 321 321 321 321 321 234 321
Harvestehude 176 176 176 176 176 176 176 176 176 176 176 176 176 158 176
Hausbruch 62 62 62 62 62 62 62 62 62 62 62 62 62 16 62
Heimfeld 241 241 241 241 241 241 241 241 241 241 241 241 241 145 241
Hoheluft-Ost 87 87 87 87 87 87 87 87 87 87 87 87 87 74 87
Hoheluft-West 109 109 109 109 109 109 109 109 109 109 109 109 109 66 109
Hohenfelde 131 131 131 131 131 131 131 131 131 131 131 131 131 106 131
Horn 212 212 212 212 212 212 212 212 212 212 212 212 212 99 212
Hummelsbüttel 95 95 95 95 95 95 95 95 95 95 95 95 95 31 95
Iserbrook 48 48 48 48 48 48 48 48 48 48 48 48 48 11 48
Jenfeld 88 88 88 88 88 88 88 88 88 88 88 88 88 54 88
Kirchwerder 30 30 30 30 30 30 30 30 30 30 30 30 30 14 30
Kleiner Grasbrook 14 14 14 14 14 14 14 14 14 14 14 14 14 0 14
Langenbek 17 17 17 17 17 17 17 17 17 17 17 17 17 1 17
Langenhorn 320 320 320 320 320 320 320 320 320 320 320 320 320 110 320
Lemsahl-Mellingstedt 25 25 25 25 25 25 25 25 25 25 25 25 25 15 25
Lohbrügge 159 159 159 159 159 159 159 159 159 159 159 159 159 48 159
Lokstedt 185 185 185 185 185 185 185 185 185 185 185 185 185 83 185
Lurup 117 117 117 117 117 117 117 117 117 117 117 117 117 72 117
Marienthal 121 121 121 121 121 121 121 121 121 121 121 121 121 75 121
Marmstorf 70 70 70 70 70 70 70 70 70 70 70 70 70 30 70
Moorburg 4 4 4 4 4 4 4 4 4 4 4 4 4 0 4
Moorfleet 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2
Neuenfelde 10 10 10 10 10 10 10 10 10 10 10 10 10 2 10
Neuengamme 8 8 8 8 8 8 8 8 8 8 8 8 8 3 8
Neugraben-Fischbek 117 117 117 117 117 117 117 117 117 117 117 117 117 49 117
Neustadt 159 159 159 159 159 159 159 159 159 159 159 159 159 117 159
Niendorf 287 287 287 287 287 287 287 287 287 287 287 287 287 133 287
Nienstedten 46 46 46 46 46 46 46 46 46 46 46 46 46 23 46
Ochsenwerder 13 13 13 13 13 13 13 13 13 13 13 13 13 6 13
Ohlsdorf 119 119 119 119 119 119 119 119 119 119 119 119 119 48 119
Osdorf 93 93 93 93 93 93 93 93 93 93 93 93 93 41 93
Othmarschen 174 174 174 174 174 174 174 174 174 174 174 174 174 80 174
Ottensen 189 189 189 189 189 189 189 189 189 189 189 189 189 133 189
Poppenbüttel 106 106 106 106 106 106 106 106 106 106 106 106 106 54 106
Rahlstedt 524 524 524 524 524 524 524 524 524 524 524 524 524 216 524
Reitbrook 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2
Rissen 138 138 138 138 138 138 138 138 138 138 138 138 138 31 138
Rothenburgsort 42 42 42 42 42 42 42 42 42 42 42 42 42 26 42
Rotherbaum 184 184 184 184 184 184 184 184 184 184 184 184 184 170 184
Rönneburg 13 13 13 13 13 13 13 13 13 13 13 13 13 3 13
Sasel 109 109 109 109 109 109 109 109 109 109 109 109 109 49 109
Schnelsen 186 186 186 186 186 186 186 186 186 186 186 186 186 72 186
Sinstorf 26 26 26 26 26 26 26 26 26 26 26 26 26 1 26
Spadenland 4 4 4 4 4 4 4 4 4 4 4 4 4 3 4
St. Georg 149 149 149 149 149 149 149 149 149 149 149 149 149 107 149
St. Pauli 143 143 143 143 143 143 143 143 143 143 143 143 143 118 143
Steilshoop 64 64 64 64 64 64 64 64 64 64 64 64 64 9 64
Steinwerder 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3
Stellingen 224 224 224 224 224 224 224 224 224 224 224 224 224 94 224
Sternschanze 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
Sülldorf 34 34 34 34 34 34 34 34 34 34 34 34 34 11 34
Tatenberg 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Tonndorf 117 117 117 117 117 117 117 117 117 117 117 117 117 63 117
Uhlenhorst 209 209 209 209 209 209 209 209 209 209 209 209 209 131 209
Veddel 17 17 17 17 17 17 17 17 17 17 17 17 17 12 17
Volksdorf 54 54 54 54 54 54 54 54 54 54 54 54 54 29 54
Wandsbek 380 380 380 380 380 380 380 380 380 380 380 380 380 229 380
Wellingsbüttel 90 90 90 90 90 90 90 90 90 90 90 90 90 46 90
Wilhelmsburg 132 132 132 132 132 132 132 132 132 132 132 132 132 64 132
Wilstorf 72 72 72 72 72 72 72 72 72 72 72 72 72 28 72
Winterhude 638 638 638 638 638 638 638 638 638 638 638 638 638 515 638
Wohldorf-Ohlstedt 21 21 21 21 21 21 21 21 21 21 21 21 21 12 21
In [76]:
print('There are {} unique categories.'.format(len(df['fclass'].unique())))
There are 101 unique categories.
In [77]:
df.groupby('fclass').count()
Out[77]:
living_space rooms cold_rent quarter city postcode rent_per_square_meter publish_date latitude longitude geometry osm_id code name dist
fclass
archaeological 12 12 12 12 12 12 12 12 12 12 12 12 12 4 12
arts_centre 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
artwork 149 149 149 149 149 149 149 149 149 149 149 149 149 63 149
atm 71 71 71 71 71 71 71 71 71 71 71 71 71 25 71
attraction 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17
bakery 218 218 218 218 218 218 218 218 218 218 218 218 218 210 218
bank 61 61 61 61 61 61 61 61 61 61 61 61 61 60 61
bar 115 115 115 115 115 115 115 115 115 115 115 115 115 113 115
battlefield 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
beauty_shop 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80
bench 1081 1081 1081 1081 1081 1081 1081 1081 1081 1081 1081 1081 1081 1 1081
beverages 39 39 39 39 39 39 39 39 39 39 39 39 39 36 39
bicycle_rental 94 94 94 94 94 94 94 94 94 94 94 94 94 81 94
bicycle_shop 115 115 115 115 115 115 115 115 115 115 115 115 115 112 115
bookshop 31 31 31 31 31 31 31 31 31 31 31 31 31 29 31
butcher 19 19 19 19 19 19 19 19 19 19 19 19 19 12 19
cafe 340 340 340 340 340 340 340 340 340 340 340 340 340 340 340
camera_surveillance 149 149 149 149 149 149 149 149 149 149 149 149 149 2 149
car_dealership 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
car_rental 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29
car_sharing 44 44 44 44 44 44 44 44 44 44 44 44 44 28 44
car_wash 7 7 7 7 7 7 7 7 7 7 7 7 7 3 7
chemist 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75
cinema 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
clothes 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108
college 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
comms_tower 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3
community_centre 78 78 78 78 78 78 78 78 78 78 78 78 78 78 78
computer_shop 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27
convenience 87 87 87 87 87 87 87 87 87 87 87 87 87 78 87
dentist 139 139 139 139 139 139 139 139 139 139 139 139 139 136 139
department_store 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
doctors 205 205 205 205 205 205 205 205 205 205 205 205 205 198 205
dog_park 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
doityourself 33 33 33 33 33 33 33 33 33 33 33 33 33 32 33
drinking_water 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
fast_food 243 243 243 243 243 243 243 243 243 243 243 243 243 240 243
fire_station 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
florist 76 76 76 76 76 76 76 76 76 76 76 76 76 74 76
food_court 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
fountain 5 5 5 5 5 5 5 5 5 5 5 5 5 0 5
furniture_shop 43 43 43 43 43 43 43 43 43 43 43 43 43 43 43
garden_centre 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2
gift_shop 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
greengrocer 9 9 9 9 9 9 9 9 9 9 9 9 9 8 9
guesthouse 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
hairdresser 335 335 335 335 335 335 335 335 335 335 335 335 335 322 335
hostel 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
hotel 90 90 90 90 90 90 90 90 90 90 90 90 90 88 90
jeweller 21 21 21 21 21 21 21 21 21 21 21 21 21 20 21
kindergarten 372 372 372 372 372 372 372 372 372 372 372 372 372 360 372
kiosk 215 215 215 215 215 215 215 215 215 215 215 215 215 151 215
laundry 72 72 72 72 72 72 72 72 72 72 72 72 72 37 72
library 32 32 32 32 32 32 32 32 32 32 32 32 32 30 32
memorial 1511 1511 1511 1511 1511 1511 1511 1511 1511 1511 1511 1511 1511 1501 1511
mobile_phone_shop 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14
museum 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
newsagent 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
nightclub 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
optician 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21
outdoor_shop 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
park 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
pharmacy 158 158 158 158 158 158 158 158 158 158 158 158 158 158 158
picnic_site 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
pitch 42 42 42 42 42 42 42 42 42 42 42 42 42 1 42
playground 729 729 729 729 729 729 729 729 729 729 729 729 729 14 729
police 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
post_box 1042 1042 1042 1042 1042 1042 1042 1042 1042 1042 1042 1042 1042 0 1042
post_office 105 105 105 105 105 105 105 105 105 105 105 105 105 56 105
pub 270 270 270 270 270 270 270 270 270 270 270 270 270 262 270
recycling 56 56 56 56 56 56 56 56 56 56 56 56 56 0 56
recycling_clothes 177 177 177 177 177 177 177 177 177 177 177 177 177 0 177
recycling_metal 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
recycling_paper 843 843 843 843 843 843 843 843 843 843 843 843 843 0 843
restaurant 875 875 875 875 875 875 875 875 875 875 875 875 875 860 875
ruins 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
school 38 38 38 38 38 38 38 38 38 38 38 38 38 38 38
shelter 15 15 15 15 15 15 15 15 15 15 15 15 15 0 15
shoe_shop 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14
sports_centre 19 19 19 19 19 19 19 19 19 19 19 19 19 17 19
sports_shop 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
stationery 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
supermarket 260 260 260 260 260 260 260 260 260 260 260 260 260 260 260
telephone 116 116 116 116 116 116 116 116 116 116 116 116 116 0 116
theatre 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
toilet 36 36 36 36 36 36 36 36 36 36 36 36 36 3 36
tourist_info 58 58 58 58 58 58 58 58 58 58 58 58 58 28 58
tower 8 8 8 8 8 8 8 8 8 8 8 8 8 0 8
toy_shop 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16
track 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
travel_agent 26 26 26 26 26 26 26 26 26 26 26 26 26 24 26
vending_any 190 190 190 190 190 190 190 190 190 190 190 190 190 2 190
vending_cigarette 111 111 111 111 111 111 111 111 111 111 111 111 111 0 111
vending_parking 41 41 41 41 41 41 41 41 41 41 41 41 41 0 41
veterinary 66 66 66 66 66 66 66 66 66 66 66 66 66 51 66
video_shop 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
viewpoint 35 35 35 35 35 35 35 35 35 35 35 35 35 14 35
waste_basket 369 369 369 369 369 369 369 369 369 369 369 369 369 1 369
water_well 9 9 9 9 9 9 9 9 9 9 9 9 9 0 9
wayside_shrine 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
windmill 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
In [78]:
# One Hot Encoding
hamburg_onehot = pd.get_dummies(df[['fclass']], prefix = "", prefix_sep = "")

## Add city column back to df
hamburg_onehot['quarter'] = df['quarter']

# Move neighbourhood column to the first column
fixed_columns = [hamburg_onehot.columns[-1]] + list(hamburg_onehot.columns[:-1])
hamburg_onehot = hamburg_onehot[fixed_columns]

# display
hamburg_onehot.head()
Out[78]:
quarter archaeological arts_centre artwork atm attraction bakery ... veterinary video_shop viewpoint waste_basket water_well wayside_shrine windmill
0 Farmsen-Berne 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
1 Wandsbek 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
2 Wilstorf 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
3 Rahlstedt 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
4 Blankenese 0 0 0 0 0 0 ... 0 0 0 0 0 0 0

5 rows × 102 columns

In [79]:
hamburg_onehot.tail()
Out[79]:
quarter archaeological arts_centre artwork atm attraction bakery ... veterinary video_shop viewpoint waste_basket water_well wayside_shrine windmill
12405 Rahlstedt 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
12406 Barmbek-Süd 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
12407 St. Pauli 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
12408 Winterhude 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
12409 Heimfeld 0 0 0 0 0 0 ... 0 0 0 0 0 0 0

5 rows × 102 columns

In [80]:
# New df dimensions
hamburg_onehot.shape
Out[80]:
(12410, 102)

Group rows by neighbourhood and by taking the mean and the frequency of occurrence of each category

In [81]:
hamburg_grouped = hamburg_onehot.groupby('quarter').mean().reset_index()
hamburg_grouped
Out[81]:
quarter archaeological arts_centre artwork atm attraction bakery ... veterinary video_shop viewpoint waste_basket water_well wayside_shrine windmill
0 Allermöhe 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
1 Alsterdorf 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.140127 0.000000 0.000000 0.000000
2 Altengamme 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
3 Altona-Altstadt 0.000000 0.000000 0.005435 0.005435 0.005435 0.092391 ... 0.000000 0.000000 0.000000 0.016304 0.000000 0.000000 0.000000
4 Altona-Nord 0.000000 0.000000 0.007937 0.007937 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.015873 0.000000 0.000000 0.000000
5 Bahrenfeld 0.005814 0.000000 0.005814 0.000000 0.000000 0.029070 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
6 Barmbek-Nord 0.000000 0.004988 0.000000 0.007481 0.000000 0.034913 ... 0.004988 0.000000 0.000000 0.009975 0.000000 0.000000 0.000000
7 Barmbek-Süd 0.000000 0.000000 0.003086 0.000000 0.040123 0.030864 ... 0.012346 0.000000 0.000000 0.021605 0.000000 0.000000 0.000000
8 Bergedorf 0.000000 0.003704 0.059259 0.011111 0.000000 0.003704 ... 0.000000 0.000000 0.003704 0.203704 0.000000 0.000000 0.000000
9 Bergstedt 0.000000 0.000000 0.040000 0.000000 0.000000 0.000000 ... 0.040000 0.000000 0.000000 0.040000 0.000000 0.000000 0.000000
10 Billbrook 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
11 Billstedt 0.007353 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.014706 0.000000 0.000000 0.000000
12 Billwerder 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
13 Blankenese 0.000000 0.000000 0.000000 0.000000 0.000000 0.014493 ... 0.028986 0.000000 0.043478 0.028986 0.000000 0.000000 0.000000
14 Borgfelde 0.000000 0.000000 0.000000 0.000000 0.000000 0.020000 ... 0.000000 0.000000 0.000000 0.010000 0.000000 0.000000 0.000000
15 Bramfeld 0.000000 0.000000 0.000000 0.007812 0.000000 0.000000 ... 0.003906 0.000000 0.000000 0.019531 0.000000 0.000000 0.000000
16 Cranz 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
17 Curslack 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
18 Dulsberg 0.000000 0.000000 0.000000 0.000000 0.000000 0.008197 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
19 Duvenstedt 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
20 Eidelstedt 0.000000 0.000000 0.000000 0.000000 0.000000 0.012876 ... 0.000000 0.000000 0.000000 0.047210 0.000000 0.000000 0.000000
21 Eilbek 0.000000 0.000000 0.000000 0.000000 0.000000 0.011628 ... 0.000000 0.000000 0.000000 0.058140 0.000000 0.000000 0.000000
22 Eimsbüttel 0.000000 0.000000 0.014045 0.002809 0.000000 0.030899 ... 0.002809 0.000000 0.000000 0.050562 0.000000 0.000000 0.000000
23 Eißendorf 0.000000 0.000000 0.000000 0.025000 0.000000 0.081250 ... 0.006250 0.000000 0.006250 0.018750 0.000000 0.000000 0.000000
24 Eppendorf 0.000000 0.006734 0.000000 0.000000 0.000000 0.016835 ... 0.000000 0.000000 0.000000 0.003367 0.000000 0.000000 0.000000
25 Farmsen-Berne 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.009009 0.000000 0.000000 0.081081 0.000000 0.000000 0.000000
26 Finkenwerder 0.000000 0.000000 0.084746 0.000000 0.000000 0.016949 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
27 Francop 0.000000 0.000000 0.000000 0.000000 0.000000 0.333333 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
28 Fuhlsbüttel 0.000000 0.000000 0.000000 0.000000 0.000000 0.043478 ... 0.043478 0.000000 0.000000 0.113043 0.000000 0.000000 0.000000
29 Groß Borstel 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.051282 0.000000 0.000000 0.000000
30 Groß Flottbek 0.000000 0.000000 0.000000 0.000000 0.000000 0.041096 ... 0.000000 0.000000 0.000000 0.041096 0.000000 0.000000 0.000000
31 HafenCity 0.000000 0.000000 0.000000 0.108434 0.006024 0.006024 ... 0.000000 0.000000 0.078313 0.006024 0.000000 0.000000 0.000000
32 Hamb.-Altstadt 0.000000 0.000000 0.280992 0.000000 0.000000 0.008264 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
33 Hamm-Mitte 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.127907 0.000000 0.000000 0.000000
34 Hamm-Nord 0.000000 0.000000 0.000000 0.020270 0.000000 0.013514 ... 0.000000 0.000000 0.000000 0.047297 0.000000 0.000000 0.000000
35 Hamm-Süd 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
36 Hammerbrook 0.000000 0.000000 0.184000 0.024000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
37 Harburg 0.000000 0.000000 0.021807 0.000000 0.000000 0.015576 ... 0.000000 0.000000 0.000000 0.024922 0.000000 0.000000 0.000000
38 Harvestehude 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
39 Hausbruch 0.000000 0.000000 0.112903 0.016129 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.032258 0.000000 0.000000 0.000000
40 Heimfeld 0.000000 0.000000 0.029046 0.004149 0.000000 0.012448 ... 0.000000 0.000000 0.000000 0.029046 0.000000 0.000000 0.000000
41 Hoheluft-Ost 0.000000 0.000000 0.000000 0.022989 0.000000 0.011494 ... 0.000000 0.000000 0.000000 0.011494 0.000000 0.000000 0.000000
42 Hoheluft-West 0.000000 0.000000 0.000000 0.000000 0.000000 0.009174 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
43 Hohenfelde 0.000000 0.000000 0.000000 0.000000 0.000000 0.022901 ... 0.000000 0.000000 0.000000 0.007634 0.000000 0.000000 0.000000
44 Horn 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.028302 0.000000 0.000000 0.000000
45 Hummelsbüttel 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.063158 0.000000 0.000000 0.000000
46 Iserbrook 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.062500 0.000000 0.000000 0.000000
47 Jenfeld 0.000000 0.000000 0.000000 0.000000 0.000000 0.011364 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
48 Kirchwerder 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
49 Kleiner Grasbrook 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.142857 0.000000 0.000000 0.000000
50 Langenbek 0.000000 0.000000 0.117647 0.000000 0.000000 0.058824 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
51 Langenhorn 0.000000 0.000000 0.000000 0.000000 0.000000 0.006250 ... 0.000000 0.000000 0.000000 0.062500 0.000000 0.003125 0.000000
52 Lemsahl-Mellingstedt 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.040000 0.000000 0.000000 0.080000 0.000000 0.000000 0.000000
53 Lohbrügge 0.000000 0.000000 0.000000 0.000000 0.000000 0.037736 ... 0.000000 0.000000 0.000000 0.025157 0.000000 0.000000 0.000000
54 Lokstedt 0.000000 0.000000 0.005405 0.000000 0.000000 0.027027 ... 0.000000 0.000000 0.000000 0.021622 0.000000 0.000000 0.000000
55 Lurup 0.000000 0.000000 0.213675 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
56 Marienthal 0.000000 0.000000 0.000000 0.000000 0.000000 0.016529 ... 0.000000 0.000000 0.000000 0.082645 0.000000 0.000000 0.000000
57 Marmstorf 0.000000 0.000000 0.000000 0.014286 0.000000 0.014286 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
58 Moorburg 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
59 Moorfleet 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
60 Neuenfelde 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
61 Neuengamme 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
62 Neugraben-Fischbek 0.000000 0.000000 0.000000 0.000000 0.000000 0.034188 ... 0.000000 0.000000 0.000000 0.034188 0.000000 0.000000 0.000000
63 Neustadt 0.000000 0.006289 0.006289 0.000000 0.006289 0.018868 ... 0.000000 0.000000 0.000000 0.012579 0.000000 0.000000 0.000000
64 Niendorf 0.006969 0.000000 0.000000 0.000000 0.000000 0.010453 ... 0.006969 0.000000 0.003484 0.000000 0.000000 0.000000 0.000000
65 Nienstedten 0.000000 0.000000 0.000000 0.000000 0.000000 0.043478 ... 0.000000 0.000000 0.000000 0.065217 0.000000 0.000000 0.000000
66 Ochsenwerder 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
67 Ohlsdorf 0.008403 0.000000 0.000000 0.000000 0.000000 0.016807 ... 0.000000 0.000000 0.000000 0.025210 0.000000 0.000000 0.000000
68 Osdorf 0.000000 0.000000 0.043011 0.000000 0.000000 0.075269 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.010753
69 Othmarschen 0.000000 0.000000 0.017241 0.000000 0.000000 0.005747 ... 0.040230 0.000000 0.063218 0.000000 0.000000 0.000000 0.000000
70 Ottensen 0.000000 0.000000 0.000000 0.000000 0.000000 0.005291 ... 0.000000 0.000000 0.005291 0.021164 0.000000 0.000000 0.000000
71 Poppenbüttel 0.028302 0.000000 0.000000 0.000000 0.000000 0.009434 ... 0.000000 0.000000 0.000000 0.009434 0.000000 0.000000 0.000000
72 Rahlstedt 0.000000 0.001908 0.003817 0.000000 0.000000 0.028626 ... 0.009542 0.001908 0.000000 0.011450 0.000000 0.000000 0.000000
73 Reitbrook 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
74 Rissen 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.144928 0.000000 0.000000 0.000000
75 Rothenburgsort 0.000000 0.000000 0.000000 0.071429 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.166667 0.000000 0.000000 0.000000
76 Rotherbaum 0.000000 0.000000 0.000000 0.000000 0.000000 0.021739 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
77 Rönneburg 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
78 Sasel 0.009174 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.009174 0.000000 0.000000 0.009174 0.000000 0.000000 0.000000
79 Schnelsen 0.000000 0.000000 0.000000 0.010753 0.000000 0.010753 ... 0.021505 0.010753 0.000000 0.000000 0.048387 0.000000 0.000000
80 Sinstorf 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
81 Spadenland 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
82 St. Georg 0.000000 0.000000 0.000000 0.000000 0.000000 0.006711 ... 0.000000 0.000000 0.000000 0.013423 0.000000 0.000000 0.000000
83 St. Pauli 0.000000 0.000000 0.000000 0.062937 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.013986 0.000000 0.000000 0.000000
84 Steilshoop 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.218750 0.000000 0.000000 0.000000
85 Steinwerder 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
86 Stellingen 0.000000 0.000000 0.000000 0.040179 0.004464 0.013393 ... 0.004464 0.000000 0.000000 0.044643 0.000000 0.000000 0.000000
87 Sternschanze 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
88 Sülldorf 0.000000 0.000000 0.000000 0.000000 0.000000 0.176471 ... 0.000000 0.000000 0.000000 0.029412 0.000000 0.000000 0.000000
89 Tatenberg 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
90 Tonndorf 0.000000 0.000000 0.000000 0.000000 0.000000 0.051282 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
91 Uhlenhorst 0.000000 0.000000 0.000000 0.004785 0.000000 0.019139 ... 0.028708 0.000000 0.000000 0.004785 0.000000 0.000000 0.000000
92 Veddel 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
93 Volksdorf 0.000000 0.000000 0.037037 0.000000 0.000000 0.018519 ... 0.018519 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
94 Wandsbek 0.000000 0.000000 0.000000 0.000000 0.000000 0.023684 ... 0.007895 0.000000 0.000000 0.002632 0.000000 0.000000 0.000000
95 Wellingsbüttel 0.033333 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
96 Wilhelmsburg 0.000000 0.000000 0.000000 0.000000 0.000000 0.045455 ... 0.000000 0.000000 0.000000 0.007576 0.000000 0.000000 0.000000
97 Wilstorf 0.000000 0.000000 0.000000 0.000000 0.000000 0.013889 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
98 Winterhude 0.000000 0.000000 0.000000 0.004702 0.000000 0.007837 ... 0.023511 0.000000 0.001567 0.020376 0.000000 0.000000 0.000000
99 Wohldorf-Ohlstedt 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

100 rows × 102 columns

Get each neighbourhood along with its top 5 most common venues

In [82]:
num_top_venues = 5

for hood in hamburg_grouped['quarter']:
    print("----"+hood+"----")
    temp = hamburg_grouped[hamburg_grouped['quarter'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
----Allermöhe----
               venue  freq
0         playground  0.77
1              bench  0.12
2  vending_cigarette  0.04
3           post_box  0.04
4        hairdresser  0.02


----Alsterdorf----
             venue  freq
0            bench  0.27
1     waste_basket  0.14
2  recycling_paper  0.07
3  vending_parking  0.07
4     kindergarten  0.06


----Altengamme----
        venue  freq
0    post_box   0.4
1     shelter   0.2
2       pitch   0.2
3        bank   0.2
4  restaurant   0.0


----Altona-Altstadt----
         venue  freq
0     memorial  0.11
1       bakery  0.09
2  supermarket  0.09
3   restaurant  0.08
4   playground  0.05


----Altona-Nord----
                 venue  freq
0                bench  0.15
1      recycling_paper  0.11
2  camera_surveillance  0.10
3       bicycle_rental  0.08
4            telephone  0.06


----Bahrenfeld----
          venue  freq
0  kindergarten  0.12
1    restaurant  0.10
2    playground  0.09
3      post_box  0.09
4       dentist  0.08


----Barmbek-Nord----
             venue  freq
0            bench  0.12
1       playground  0.10
2              pub  0.10
3  recycling_paper  0.08
4         memorial  0.07


----Barmbek-Süd----
          venue  freq
0      memorial  0.11
1    restaurant  0.10
2         bench  0.07
3      post_box  0.07
4  kindergarten  0.06


----Bergedorf----
             venue  freq
0     waste_basket  0.20
1            bench  0.17
2       restaurant  0.12
3  recycling_paper  0.09
4         post_box  0.07


----Bergstedt----
        venue  freq
0    post_box  0.48
1       kiosk  0.08
2  playground  0.08
3  veterinary  0.04
4  restaurant  0.04


----Billbrook----
             venue  freq
0   furniture_shop   1.0
1   archaeological   0.0
2      picnic_site   0.0
3  recycling_paper   0.0
4  recycling_metal   0.0


----Billstedt----
             venue  freq
0        fast_food  0.11
1         memorial  0.11
2         post_box  0.11
3  recycling_paper  0.11
4              pub  0.10


----Billwerder----
             venue  freq
0       restaurant  0.67
1           museum  0.33
2   archaeological  0.00
3            pitch  0.00
4  recycling_paper  0.00


----Blankenese----
         venue  freq
0        bench  0.17
1     post_box  0.12
2     memorial  0.10
3  vending_any  0.10
4   restaurant  0.06


----Borgfelde----
          venue  freq
0      memorial  0.55
1      post_box  0.18
2       theatre  0.12
3         bench  0.04
4  tourist_info  0.03


----Bramfeld----
             venue  freq
0         post_box  0.18
1       restaurant  0.17
2  recycling_paper  0.07
3         memorial  0.05
4        telephone  0.04


----Cranz----
             venue  freq
0         post_box  0.67
1  recycling_paper  0.33
2   archaeological  0.00
3            pitch  0.00
4       restaurant  0.00


----Curslack----
               venue  freq
0            doctors  0.31
1           post_box  0.25
2  recycling_clothes  0.12
3            florist  0.12
4         restaurant  0.06


----Dulsberg----
               venue  freq
0              bench  0.19
1           memorial  0.14
2        hairdresser  0.07
3  recycling_clothes  0.07
4                pub  0.07


----Duvenstedt----
          venue  freq
0         bench  0.23
1  bicycle_shop  0.19
2  kindergarten  0.15
3    restaurant  0.12
4       doctors  0.08


----Eidelstedt----
               venue  freq
0         playground  0.16
1              bench  0.12
2           post_box  0.12
3  recycling_clothes  0.07
4    recycling_paper  0.06


----Eilbek----
      venue  freq
0  memorial  0.25
1      cafe  0.09
2       pub  0.08
3     bench  0.08
4  post_box  0.08


----Eimsbüttel----
          venue  freq
0      memorial  0.16
1    restaurant  0.09
2  kindergarten  0.08
3   hairdresser  0.07
4          cafe  0.06


----Eißendorf----
             venue  freq
0       restaurant  0.12
1         post_box  0.09
2  recycling_paper  0.08
3           bakery  0.08
4       playground  0.08


----Eppendorf----
             venue  freq
0         memorial  0.27
1             cafe  0.13
2         post_box  0.08
3       restaurant  0.08
4  recycling_paper  0.06


----Farmsen-Berne----
          venue  freq
0         bench  0.23
1      post_box  0.20
2    playground  0.16
3      memorial  0.09
4  waste_basket  0.08


----Finkenwerder----
             venue  freq
0       playground  0.24
1         post_box  0.24
2          artwork  0.08
3  recycling_paper  0.08
4            bench  0.07


----Francop----
            venue  freq
0          bakery  0.33
1      guesthouse  0.33
2    fire_station  0.33
3  archaeological  0.00
4           pitch  0.00


----Fuhlsbüttel----
          venue  freq
0   hairdresser  0.11
1  waste_basket  0.11
2         bench  0.10
3    restaurant  0.10
4     fast_food  0.08


----Groß Borstel----
             venue  freq
0         post_box  0.27
1            bench  0.15
2  vending_parking  0.12
3     kindergarten  0.10
4  recycling_paper  0.06


----Groß Flottbek----
             venue  freq
0         post_box  0.23
1            bench  0.11
2          library  0.08
3  recycling_paper  0.07
4             cafe  0.05


----HafenCity----
        venue  freq
0  restaurant  0.18
1       bench  0.15
2         atm  0.11
3   viewpoint  0.08
4   fast_food  0.07


----Hamb.-Altstadt----
        venue  freq
0     artwork  0.28
1  restaurant  0.18
2    post_box  0.14
3     clothes  0.10
4    memorial  0.07


----Hamm-Mitte----
               venue  freq
0           memorial  0.30
1       waste_basket  0.13
2  vending_cigarette  0.07
3          fast_food  0.07
4              bench  0.06


----Hamm-Nord----
             venue  freq
0         memorial  0.26
1      hairdresser  0.12
2     waste_basket  0.05
3       restaurant  0.05
4  recycling_paper  0.05


----Hamm-Süd----
         venue  freq
0     post_box  0.31
1        kiosk  0.28
2       school  0.14
3  vending_any  0.14
4     memorial  0.07


----Hammerbrook----
        venue  freq
0       hotel  0.27
1     artwork  0.18
2  car_rental  0.10
3         bar  0.09
4         pub  0.09


----Harburg----
          venue  freq
0      memorial  0.20
1         bench  0.12
2         kiosk  0.07
3  kindergarten  0.06
4           pub  0.05


----Harvestehude----
             venue  freq
0         memorial  0.75
1         post_box  0.05
2  recycling_paper  0.03
3     kindergarten  0.02
4          dentist  0.02


----Hausbruch----
              venue  freq
0        playground  0.18
1          post_box  0.16
2           artwork  0.11
3  community_centre  0.10
4             bench  0.10


----Heimfeld----
               venue  freq
0           memorial  0.24
1  recycling_clothes  0.12
2           post_box  0.08
3              kiosk  0.06
4       kindergarten  0.05


----Hoheluft-Ost----
         venue  freq
0     memorial  0.45
1   restaurant  0.11
2         cafe  0.06
3  hairdresser  0.05
4        bench  0.03


----Hoheluft-West----
         venue  freq
0        bench  0.17
1     memorial  0.14
2   restaurant  0.11
3  supermarket  0.08
4    fast_food  0.06


----Hohenfelde----
          venue  freq
0      memorial  0.56
1  kindergarten  0.08
2      post_box  0.06
3         kiosk  0.04
4    restaurant  0.03


----Horn----
         venue  freq
0     memorial  0.16
1   playground  0.15
2      laundry  0.11
3    telephone  0.08
4  hairdresser  0.07


----Hummelsbüttel----
          venue  freq
0    playground  0.35
1      post_box  0.15
2       doctors  0.11
3  waste_basket  0.06
4    restaurant  0.05


----Iserbrook----
             venue  freq
0         post_box  0.31
1            bench  0.17
2  recycling_paper  0.17
3      supermarket  0.08
4     waste_basket  0.06


----Jenfeld----
        venue  freq
0  playground  0.23
1  restaurant  0.15
2   telephone  0.08
3       kiosk  0.07
4    post_box  0.07


----Kirchwerder----
              venue  freq
0          post_box  0.33
1             bench  0.17
2       post_office  0.10
3       supermarket  0.10
4  community_centre  0.03


----Kleiner Grasbrook----
                 venue  freq
0          convenience  0.43
1  camera_surveillance  0.21
2      recycling_paper  0.14
3         waste_basket  0.14
4           playground  0.07


----Langenbek----
             venue  freq
0         post_box  0.29
1       playground  0.24
2  recycling_paper  0.18
3          artwork  0.12
4       restaurant  0.06


----Langenhorn----
             venue  freq
0            bench  0.24
1       playground  0.18
2             cafe  0.12
3     waste_basket  0.06
4  recycling_paper  0.05


----Lemsahl-Mellingstedt----
          venue  freq
0      memorial  0.32
1      post_box  0.24
2  fire_station  0.16
3      pharmacy  0.08
4  waste_basket  0.08


----Lohbrügge----
             venue  freq
0            bench  0.35
1  recycling_paper  0.14
2      vending_any  0.06
3      supermarket  0.06
4        fast_food  0.04


----Lokstedt----
             venue  freq
0  recycling_paper  0.11
1            bench  0.10
2         post_box  0.09
3   bicycle_rental  0.09
4       playground  0.08


----Lurup----
             venue  freq
0          artwork  0.21
1  recycling_paper  0.14
2     kindergarten  0.13
3       restaurant  0.13
4       playground  0.07


----Marienthal----
               venue  freq
0           memorial  0.31
1           post_box  0.10
2            chemist  0.09
3       waste_basket  0.08
4  recycling_clothes  0.08


----Marmstorf----
             venue  freq
0  recycling_paper  0.20
1         post_box  0.13
2      hairdresser  0.11
3          butcher  0.10
4             bank  0.07


----Moorburg----
             venue  freq
0            bench   1.0
1   archaeological   0.0
2            pitch   0.0
3       restaurant   0.0
4  recycling_paper   0.0


----Moorfleet----
            venue  freq
0           bench   0.5
1  furniture_shop   0.5
2  archaeological   0.0
3           pitch   0.0
4      restaurant   0.0


----Neuenfelde----
             venue  freq
0        recycling   0.3
1       playground   0.3
2     kindergarten   0.1
3       restaurant   0.1
4  recycling_paper   0.1


----Neuengamme----
             venue  freq
0         post_box  0.25
1          shelter  0.12
2         memorial  0.12
3  recycling_paper  0.12
4          dentist  0.12


----Neugraben-Fischbek----
             venue  freq
0       restaurant  0.18
1  recycling_paper  0.14
2        recycling  0.09
3       playground  0.07
4            bench  0.05


----Neustadt----
        venue  freq
0    memorial  0.21
1       bench  0.17
2  restaurant  0.12
3     clothes  0.06
4         bar  0.06


----Niendorf----
             venue  freq
0       playground  0.17
1  recycling_paper  0.16
2         post_box  0.11
3       restaurant  0.07
4     kindergarten  0.05


----Nienstedten----
             venue  freq
0            bench  0.15
1         post_box  0.15
2       restaurant  0.11
3     kindergarten  0.09
4  recycling_paper  0.09


----Ochsenwerder----
             venue  freq
0         memorial  0.46
1         post_box  0.31
2  recycling_paper  0.23
3   archaeological  0.00
4            pitch  0.00


----Ohlsdorf----
             venue  freq
0  recycling_paper  0.27
1            bench  0.15
2         pharmacy  0.12
3         post_box  0.09
4     kindergarten  0.09


----Osdorf----
               venue  freq
0           post_box  0.18
1            doctors  0.11
2  vending_cigarette  0.11
3             bakery  0.08
4         playground  0.06


----Othmarschen----
                 venue  freq
0           playground  0.16
1             pharmacy  0.14
2                bench  0.10
3  camera_surveillance  0.09
4         tourist_info  0.08


----Ottensen----
             venue  freq
0            bench  0.10
1              pub  0.08
2       restaurant  0.07
3         memorial  0.06
4  recycling_paper  0.06


----Poppenbüttel----
             venue  freq
0            bench  0.15
1      hairdresser  0.13
2  recycling_paper  0.12
3         post_box  0.10
4          dentist  0.08


----Rahlstedt----
             venue  freq
0         post_box  0.18
1       playground  0.15
2       restaurant  0.08
3            bench  0.07
4  recycling_paper  0.06


----Reitbrook----
             venue  freq
0      convenience   0.5
1         post_box   0.5
2   archaeological   0.0
3            pitch   0.0
4  recycling_paper   0.0


----Rissen----
             venue  freq
0  recycling_paper  0.43
1     waste_basket  0.14
2      supermarket  0.11
3            bench  0.10
4        fast_food  0.04


----Rothenburgsort----
               venue  freq
0           memorial  0.52
1       waste_basket  0.17
2                atm  0.07
3         playground  0.05
4  vending_cigarette  0.02


----Rotherbaum----
         venue  freq
0     memorial  0.44
1   restaurant  0.12
2         cafe  0.04
3     post_box  0.04
4  hairdresser  0.04


----Rönneburg----
           venue  freq
0       post_box  0.23
1          bench  0.23
2     restaurant  0.15
3  sports_centre  0.15
4     playground  0.15


----Sasel----
         venue  freq
0     post_box  0.22
1  supermarket  0.16
2        bench  0.15
3  vending_any  0.07
4     memorial  0.06


----Schnelsen----
             venue  freq
0  recycling_paper  0.19
1       playground  0.15
2         post_box  0.14
3       restaurant  0.08
4     doityourself  0.08


----Sinstorf----
             venue  freq
0  recycling_paper  0.73
1            kiosk  0.08
2            bench  0.08
3         post_box  0.08
4         memorial  0.04


----Spadenland----
            venue  freq
0      restaurant  0.50
1        memorial  0.25
2        post_box  0.25
3  archaeological  0.00
4           pitch  0.00


----St. Georg----
               venue  freq
0           memorial  0.21
1  vending_cigarette  0.07
2       bicycle_shop  0.07
3              bench  0.07
4         restaurant  0.05


----St. Pauli----
        venue  freq
0         pub  0.18
1         bar  0.16
2  restaurant  0.09
3    memorial  0.08
4         atm  0.06


----Steilshoop----
              venue  freq
0   recycling_paper  0.44
1      waste_basket  0.22
2  community_centre  0.06
3             bench  0.06
4      kindergarten  0.06


----Steinwerder----
             venue  freq
0            bench   1.0
1   archaeological   0.0
2            pitch   0.0
3       restaurant   0.0
4  recycling_paper   0.0


----Stellingen----
             venue  freq
0            bench  0.21
1  recycling_paper  0.14
2         post_box  0.09
3       restaurant  0.08
4          doctors  0.07


----Sternschanze----
          venue  freq
0  bicycle_shop  0.16
1          cafe  0.08
2  doityourself  0.08
3    restaurant  0.08
4   supermarket  0.08


----Sülldorf----
          venue  freq
0      post_box  0.29
1         bench  0.18
2        bakery  0.18
3    restaurant  0.09
4  kindergarten  0.06


----Tatenberg----
             venue  freq
0      convenience   1.0
1   archaeological   0.0
2            pitch   0.0
3  recycling_paper   0.0
4  recycling_metal   0.0


----Tonndorf----
             venue  freq
0         post_box  0.13
1             cafe  0.11
2  recycling_paper  0.11
3        fast_food  0.10
4       restaurant  0.09


----Uhlenhorst----
        venue  freq
0    memorial  0.20
1       bench  0.14
2  restaurant  0.11
3    post_box  0.09
4        cafe  0.05


----Veddel----
               venue  freq
0                bar  0.18
1              kiosk  0.18
2  recycling_clothes  0.18
3           memorial  0.12
4    recycling_paper  0.12


----Volksdorf----
             venue  freq
0         post_box  0.13
1          doctors  0.09
2  recycling_paper  0.07
3         memorial  0.06
4       restaurant  0.06


----Wandsbek----
                 venue  freq
0           restaurant  0.09
1  camera_surveillance  0.08
2         kindergarten  0.06
3                 cafe  0.06
4                bench  0.05


----Wellingsbüttel----
          venue  freq
0      post_box  0.16
1         bench  0.13
2      memorial  0.10
3       doctors  0.10
4  bicycle_shop  0.08


----Wilhelmsburg----
                 venue  freq
0      recycling_paper  0.16
1             post_box  0.14
2          convenience  0.09
3                kiosk  0.08
4  camera_surveillance  0.06


----Wilstorf----
             venue  freq
0         post_box  0.32
1  recycling_paper  0.14
2         memorial  0.11
3            kiosk  0.11
4       playground  0.11


----Winterhude----
         venue  freq
0     memorial  0.25
1   restaurant  0.17
2         cafe  0.06
3  hairdresser  0.05
4   playground  0.04


----Wohldorf-Ohlstedt----
      venue  freq
0     bench  0.24
1  memorial  0.19
2  pharmacy  0.14
3       pub  0.14
4  post_box  0.14


Put that in pandas data frame

In [83]:
# Function to sort venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
In [84]:
# New dataframe ordered
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['quarter']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['quarter'] = hamburg_grouped['quarter']

for ind in np.arange(hamburg_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hamburg_grouped.iloc[ind, :], 
                                                                          num_top_venues)

neighbourhoods_venues_sorted.head()
Out[84]:
quarter 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue
0 Allermöhe playground bench post_box vending_cigarette hairdresser
1 Alsterdorf bench waste_basket vending_parking recycling_paper kindergarten
2 Altengamme post_box shelter bank pitch windmill
3 Altona-Altstadt memorial bakery supermarket restaurant bicycle_rental
4 Altona-Nord bench recycling_paper camera_surveillance bicycle_rental telephone
In [85]:
neighbourhoods_venues_sorted.tail()
Out[85]:
quarter 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue
95 Wellingsbüttel post_box bench doctors memorial bicycle_shop
96 Wilhelmsburg recycling_paper post_box convenience kiosk memorial
97 Wilstorf post_box recycling_paper kiosk playground memorial
98 Winterhude memorial restaurant cafe hairdresser playground
99 Wohldorf-Ohlstedt bench memorial pub post_box pharmacy

Let's look at how many categories are on the 1st most common venue.

In [86]:
print('There are {} unique categories.'.format(len(neighbourhoods_venues_sorted['1st Most Common Venue'].unique())))
There are 18 unique categories.
In [87]:
neighbourhoods_venues_sorted.groupby('1st Most Common Venue').count()
Out[87]:
quarter 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue
1st Most Common Venue
artwork 2 2 2 2 2
bench 17 17 17 17 17
bicycle_shop 1 1 1 1 1
convenience 3 3 3 3 3
doctors 1 1 1 1 1
fast_food 1 1 1 1 1
furniture_shop 2 2 2 2 2
guesthouse 1 1 1 1 1
hotel 1 1 1 1 1
kindergarten 1 1 1 1 1
memorial 23 23 23 23 23
playground 9 9 9 9 9
post_box 20 20 20 20 20
pub 1 1 1 1 1
recycling_clothes 1 1 1 1 1
recycling_paper 8 8 8 8 8
restaurant 6 6 6 6 6
waste_basket 2 2 2 2 2

From above, we observe that the most common venues across the dataset are Memorial, Post Box, Bench, Play Ground, Recycling Paper, followed by Restaurant. Different restaurant venues are in subcategories which makes them less common than if they were aggregated. Thus, for the purpose accounting for the venues that may have the most impact on price, we will limit the venues to the most common categories. It is unlikely that having a hotels, restaurants will affect price. I think these places or venues are not affecting the rent prices at all. Thus, that category of venues will not be considered.

6. Preparing data for modelling

In [88]:
# Read dataset
df = pd.read_csv('ckdnearest(df3, df2).csv')
df.head()
Out[88]:
Unnamed: 0 living_space rooms cold_rent quarter city postcode ... longitude geometry osm_id code fclass name dist
0 0 116.4 4.5 1453.2 Farmsen-Berne Hamburg 22159 ... 10.121997 POINT (10.12199735210979 53.59779278086435) 1190663105 2205 playground NaN 0.002903
1 1 78.0 3.0 819.6 Wandsbek Hamburg 22047 ... 10.086159 POINT (10.0861590022222 53.58669136622085) 4100013663 2514 chemist Budnikowsky 0.000321
2 2 62.4 3.5 504.0 Wilstorf Hamburg 21079 ... 9.996353 POINT (9.996352831413693 53.44490810458736) 3517456812 2724 memorial Berthold Bormann 0.000318
3 3 98.4 3.5 1131.6 Rahlstedt Hamburg 22145 ... 10.145269 POINT (10.14526927662929 53.62468452926397) 6348457833 2593 vending_any NaN 0.000537
4 4 144.0 3.5 2280.0 Blankenese Hamburg 22587 ... 9.799682 POINT (9.799681528125992 53.55965182828339) 308012942 2902 bench NaN 0.000655

5 rows × 17 columns

In [89]:
# Dropping variables no longer needed
df.drop(['longitude', 'latitude', 'name', 'osm_id', 'Unnamed: 0', 'geometry', 'code', 'fclass', 'name', 'dist', 'quarter', 'publish_date'], axis=1, inplace=True)
In [90]:
df.head()
Out[90]:
living_space rooms cold_rent city postcode rent_per_square_meter
0 116.4 4.5 1453.2 Hamburg 22159 14.976
1 78.0 3.0 819.6 Hamburg 22047 12.612
2 62.4 3.5 504.0 Hamburg 21079 9.696
3 98.4 3.5 1131.6 Hamburg 22145 13.800
4 144.0 3.5 2280.0 Hamburg 22587 18.996

We get dummies for our categorical variables to get the dataset ready for multicollinearity analysis.

In [91]:
transformed_df = pd.get_dummies(df)
transformed_df.head()
Out[91]:
living_space rooms cold_rent postcode rent_per_square_meter city_Bahrenfeld city_Barmbek ... city_St. Georg city_St. Pauli city_Uhlenhorst city_Uhlenhorst Hamburg city_Volksdorf city_Wilhelmsburg city_Winterhude
0 116.4 4.5 1453.2 22159 14.976 0 0 ... 0 0 0 0 0 0 0
1 78.0 3.0 819.6 22047 12.612 0 0 ... 0 0 0 0 0 0 0
2 62.4 3.5 504.0 21079 9.696 0 0 ... 0 0 0 0 0 0 0
3 98.4 3.5 1131.6 22145 13.800 0 0 ... 0 0 0 0 0 0 0
4 144.0 3.5 2280.0 22587 18.996 0 0 ... 0 0 0 0 0 0 0

5 rows × 189 columns

We now assess for multicollinearity of features:

In [92]:
def multi_collinearity_heatmap(df, figsize=(11,9)):
    
    """
    Creates a heatmap of correlations between features in the df. A figure size can optionally be set.
    """
    
    # Set the style of the visualization
    sns.set(style="white")

    # Create a covariance matrix
    corr = df.corr()

    # Generate a mask the size of our covariance matrix
    mask = np.zeros_like(corr, dtype=np.bool)
    mask[np.triu_indices_from(mask)] = True

    # Set up the matplotlib figure
    f, ax = plt.subplots(figsize=figsize)

    # Generate a custom diverging colormap
    cmap = sns.diverging_palette(220, 10, as_cmap=True)

    # Draw the heatmap with the mask and correct aspect ratio
    sns.heatmap(corr, mask=mask, cmap=cmap, center=0, square=True, linewidths=.5, cbar_kws={"shrink": .5}, vmax=corr[corr != 1.0].max().max());
In [93]:
multi_collinearity_heatmap(transformed_df, figsize=(40,40))

It doesn't look like there are any significant collinear relationship with city, so these will temporarily be dropped to produce a clearer heatmap for the remaining features.

In [94]:
multi_collinearity_heatmap(transformed_df.drop(list(transformed_df.columns[transformed_df.columns.str.startswith('city')]), axis=1), figsize=(25,22))

Now it looks better without the attribute city. But we will keep both city and zipcode in our model.

Standardising and normalising¶

In [95]:
numerical_columns = ['living_space','cold_rent','rent_per_square_meter', 'rooms', 'postcode']
In [96]:
transformed_df[numerical_columns].hist(figsize=(10,11));
In [97]:
# Separating X and y
X = transformed_df.drop('rent_per_square_meter', axis=1)
y = transformed_df.rent_per_square_meter

# Scaling
scaler = StandardScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns=list(X.columns))
In [98]:
# Splitting into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

7. Model : Gradient boosted decision trees

Xgboost is short for eXtreme Gradient Boosting package. Xgboost is a library which is designed and optimised for boosting tree algorithms. Extreme Gradient Boosting (xgboost) is same as gradient boosting framework but Xgboost is more efficient, flexible and portable. It is the package which is used to solve data science problems which include both linear model solver and tree learning algorithms.

Gradient boosting is an approach where new models are created that predict the residuals or errors of prior models and then added together to make the final prediction. It is called gradient boosting because it uses a gradient descent algorithm to minimize the loss when adding new models. XGBoost (eXtreme Gradient Boosting) is an implementation of gradient boosted decision trees designed for speed and performance. Is a very popular algorithm that has recently been dominating applied machine learning for structured or tabular data.

This model will most likely provide the best achievable accuracy and a measure of feature importance compared to our Hedonic regression (other than possible small accuracy increases from hyper-parameter tuning) due to XGBoost's superior performance.

In [99]:
xgb_reg_start = time.time()

xgb_reg = xgb.XGBRegressor()
xgb_reg.fit(X_train, y_train)
training_preds_xgb_reg = xgb_reg.predict(X_train)
val_preds_xgb_reg = xgb_reg.predict(X_test)

xgb_reg_end = time.time()

print(f"Time taken to run: {round((xgb_reg_end - xgb_reg_start)/60,1)} minutes")
print("\nTraining MSE:", round(mean_squared_error(y_train, training_preds_xgb_reg),4))
print("Validation MSE:", round(mean_squared_error(y_test, val_preds_xgb_reg),4))
print("\nTraining r2:", round(r2_score(y_train, training_preds_xgb_reg),4))
print("Validation r2:", round(r2_score(y_test, val_preds_xgb_reg),4))
Time taken to run: 0.1 minutes

Training MSE: 0.0218
Validation MSE: 0.0734

Training r2: 0.9988
Validation r2: 0.9962

This model looks good and our features explain approximately 99.88% of the variance in our target variable.

In [100]:
y_test_array = np.array(list(y_test))
val_preds_xgb_reg_array = np.array(val_preds_xgb_reg)
hpm_df = pd.DataFrame({'Actual': y_test_array.flatten(), 'Predicted': val_preds_xgb_reg_array.flatten()})
hpm_df
Out[100]:
Actual Predicted
0 16.212 16.594782
1 11.772 11.665022
2 9.744 9.923347
3 9.360 9.552456
4 7.044 7.316082
... ... ...
2477 10.848 11.018529
2478 12.636 12.643166
2479 13.800 13.675182
2480 8.388 8.522924
2481 12.096 12.384506

2482 rows × 2 columns

In [101]:
actual_values = y_test
plt.scatter(val_preds_xgb_reg, actual_values, alpha=.7,
            color='b') #alpha helps to show overlapping data
overlay = 'R^2 is: {}\nRMSE is: {}'.format(
                    (round(r2_score(y_test, val_preds_xgb_reg),4)),
                    (round(mean_squared_error(y_test, val_preds_xgb_reg))),4)
plt.annotate( s=overlay,xy=(5.5,2.5),size='x-large')
plt.xlabel('Predicted Price').set_color('blue')
plt.ylabel('Actual Price').set_color('red')
plt.title('SHP Regression Model')
plt.show()

Our predicted values are almost identical to the actual values and fits perfectly. This graph would be the straight line y=x if each predicted value x would be equal to each actual value y.

8. Feature importance

Apart from its superior performance, a benefit of using ensembles of decision tree methods like gradient boosting is that they can automatically provide estimates of feature importance from a trained predictive model.

Generally, importance provides a score that indicates how useful or valuable each feature was in the construction of the boosted decision trees within the model. The more an attribute is used to make key decisions with decision trees, the higher its relative importance.

This importance is calculated explicitly for each attribute in the dataset, allowing attributes to be ranked and compared to each other.

The importance is calculated for a single decision tree by the amount that each attribute split point improves the performance measure, weighted by the number of observations the node is responsible for. The performance measure may be the purity (Gini index) used to select the split points or another more specific error function. The feature importances are then averaged across all of the decision trees within the model.

In [102]:
ft_weights_xgb_reg = pd.DataFrame(xgb_reg.feature_importances_, columns=['weight'], index=X_train.columns)
ft_weights_xgb_reg.sort_values('weight', ascending=False, inplace=True)
ft_weights_xgb_reg.head(10)
Out[102]:
weight
cold_rent 0.565974
living_space 0.379086
city_Hamburg Eppendorf 0.009731
rooms 0.007601
postcode 0.005088
city_Hamburg Altstadt 0.004138
city_Hamburg Wellingsbüttel 0.003363
city_Rahlstedt 0.001669
city_Hamburg Harvestehude 0.001643
city_Hamburg Uhlenhorst 0.001624

We can see the most important features from top to bottom.

In [103]:
# Plotting feature importances
plt.figure(figsize=(10,25))
plt.barh(ft_weights_xgb_reg.index, ft_weights_xgb_reg.weight, align='center') 
plt.title("Feature importances in the XGBoost model", fontsize=14)
plt.xlabel("Feature importance")
plt.margins(y=0.01)
plt.show()

we can see the most important features in the plot above. We can also improve our model by removing the least important features and build the model for better accuracy and results.

9. Ridge Regularization

We can also try using Ridge Regularization to decrease the influence of less important features. Ridge Regularization is a process which shrinks the regression coefficients of less important features. We’ll once again instantiate the model. The Ridge Regularization model takes a parameter, alpha, which controls the strength of the regularization.

We’ll experiment by looping through a few different values of alpha, and see how this changes our results.

In [104]:
lr = linear_model.LinearRegression()

for i in range (-2, 3):
    alpha = 10**i
    rm = linear_model.Ridge(alpha=alpha)
    ridge_model = rm.fit(X_train, y_train)
    preds_ridge = ridge_model.predict(X_test)

    plt.scatter(preds_ridge, actual_values, alpha=.75, color='r')
    plt.xlabel('Predicted Price')
    plt.ylabel('Actual Price')
    plt.title('Ridge Regularization with alpha = {}'.format(alpha))
    overlay = 'R^2 is: {}\nRMSE is: {}'.format(
                   round(ridge_model.score(X_test, y_test), 4),
                    round(mean_squared_error(y_train, training_preds_xgb_reg),4))
    plt.annotate( s=overlay,xy=(5.5,2.5),size='x-large')
    plt.show()

These models performance doesn't improve our first model. In our case, adjusting the alpha did not substantially improve our model.

10. XG Boost with dropped columns

Let's try to improve the model by removing the city names

In [105]:
# Read dataset
df = pd.read_csv('ckdnearest(df3, df2).csv')
df.head()
Out[105]:
Unnamed: 0 living_space rooms cold_rent quarter city postcode ... longitude geometry osm_id code fclass name dist
0 0 116.4 4.5 1453.2 Farmsen-Berne Hamburg 22159 ... 10.121997 POINT (10.12199735210979 53.59779278086435) 1190663105 2205 playground NaN 0.002903
1 1 78.0 3.0 819.6 Wandsbek Hamburg 22047 ... 10.086159 POINT (10.0861590022222 53.58669136622085) 4100013663 2514 chemist Budnikowsky 0.000321
2 2 62.4 3.5 504.0 Wilstorf Hamburg 21079 ... 9.996353 POINT (9.996352831413693 53.44490810458736) 3517456812 2724 memorial Berthold Bormann 0.000318
3 3 98.4 3.5 1131.6 Rahlstedt Hamburg 22145 ... 10.145269 POINT (10.14526927662929 53.62468452926397) 6348457833 2593 vending_any NaN 0.000537
4 4 144.0 3.5 2280.0 Blankenese Hamburg 22587 ... 9.799682 POINT (9.799681528125992 53.55965182828339) 308012942 2902 bench NaN 0.000655

5 rows × 17 columns

In [106]:
# Dropping variables no longer needed
df.drop(['longitude', 'latitude', 'name', 'osm_id', 'Unnamed: 0', 'geometry', 'code', 'fclass', 'name', 'dist', 'city', 'publish_date', 'quarter' ], axis=1, inplace=True)
In [107]:
df.head()
Out[107]:
living_space rooms cold_rent postcode rent_per_square_meter
0 116.4 4.5 1453.2 22159 14.976
1 78.0 3.0 819.6 22047 12.612
2 62.4 3.5 504.0 21079 9.696
3 98.4 3.5 1131.6 22145 13.800
4 144.0 3.5 2280.0 22587 18.996
In [108]:
transformed_df = pd.get_dummies(df)
transformed_df.head()
Out[108]:
living_space rooms cold_rent postcode rent_per_square_meter
0 116.4 4.5 1453.2 22159 14.976
1 78.0 3.0 819.6 22047 12.612
2 62.4 3.5 504.0 21079 9.696
3 98.4 3.5 1131.6 22145 13.800
4 144.0 3.5 2280.0 22587 18.996
In [109]:
multi_collinearity_heatmap(transformed_df, figsize=(20,20))
In [110]:
numerical_columns = ['living_space','cold_rent','rent_per_square_meter', 'rooms', 'postcode']
In [111]:
transformed_df[numerical_columns].hist(figsize=(10,11));
In [112]:
# Separating X and y
X = transformed_df.drop('rent_per_square_meter', axis=1)
y = transformed_df.rent_per_square_meter

# Scaling
scaler = StandardScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns=list(X.columns))
In [113]:
# Splitting into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)
In [114]:
hpm_reg_start = time.time()

# Create instance of the model, `LinearRegression` function from 
# Scikit-Learn and fit the model on the training data:

hpm_reg = LinearRegression()  
hpm_reg.fit(X_train, y_train) #training the algorithm

# Now that the model has been fit we can make predictions by calling 
# the predict command. We are making predictions on the testing set:
training_preds_hpm_reg = hpm_reg.predict(X_train)
val_preds_hpm_reg = hpm_reg.predict(X_test)

hpm_reg_end = time.time()

print(f"Time taken to run: {round((hpm_reg_end - hpm_reg_start)/60,1)} minutes")

# Check the predictions against the actual values by using the MSE and R-2 metrics:
print("\nTraining RMSE:", round(mean_squared_error(y_train, training_preds_hpm_reg),4))
print("Validation RMSE:", round(mean_squared_error(y_test, val_preds_hpm_reg),4))
print("\nTraining r2:", round(r2_score(y_train, training_preds_hpm_reg),4))
print("Validation r2:", round(r2_score(y_test, val_preds_hpm_reg),4))
Time taken to run: 0.0 minutes

Training RMSE: 3.2989
Validation RMSE: 3.2008

Training r2: 0.8205
Validation r2: 0.8356

This model shows our features explain approximately 82.05% of the variance in our target variable which is less than the previous Xgboost model.

In [115]:
y_test_array = np.array(list(y_test))
val_preds_hpm_reg_array = np.array(val_preds_hpm_reg)
hpm_df = pd.DataFrame({'Actual': y_test_array.flatten(), 'Predicted': val_preds_hpm_reg_array.flatten()})
hpm_df
Out[115]:
Actual Predicted
0 16.212 16.149201
1 11.772 12.910540
2 9.744 9.327286
3 9.360 10.696145
4 7.044 3.993660
... ... ...
2477 10.848 12.698730
2478 12.636 13.368404
2479 13.800 14.279360
2480 8.388 10.871439
2481 12.096 11.573131

2482 rows × 2 columns

In [116]:
actual_values = y_test
plt.scatter(val_preds_hpm_reg, actual_values, alpha=.7,
            color='r') #alpha helps to show overlapping data
overlay = 'R^2 is: {}\nRMSE is: {}'.format(
                    (round(r2_score(y_test, val_preds_hpm_reg),4)),
                    (round(mean_squared_error(y_test, val_preds_hpm_reg))),4)
plt.annotate( s=overlay,xy=(5.5,2.5),size='x-large')
plt.xlabel('Predicted Price')
plt.ylabel('Actual Price')
plt.title('SHP Regression Model')
plt.show()

It doesn't look like removing the city attribute improved our model. It is almost identical as the Ridge Regularization.

Conclusion

Xgboost model we created was the best amongst all, showed good results and predicted the rent per square meter accurately. We also tried to use Ridge Regularization but it doesn't improve the model. Ridge Regularization results are almost similar to the XG Boost with dropped columns results.

We could also use other regression models and compare them. In our case Xgboost showed accurate results, the features explained approximately 99.88% of the variation in price with an RMSE of 73.4.

The other shapefiles could also be downloaded and used for better analysis. There are other shapefiles in the Geofabrik website such as railways, bus stops, buildings nearby or lakes etc.

In [ ]: